The most time intensive part of the data mining process is arguably the creation of a data set from which to build an analytic model. The data in a relational data warehouse is typically not in a form suitable for input directly into a data mining algorithm. New variables may need to be created using formulas, aggregations and/or expansions on specific values of a dimensioning variable. The joining of tables and/or denormalizing or flattening of relational tables may also be needed. In addition, statistical transformations are often required, depending on the type of algorithm to be used as well as the statistical properties of the data itself. These capabilities are referred to simply as Analytic Data Sets.
- A Variable Creation analysis accesses one or more tables or views and provides expression building and dimensioning to define new variable columns and place them in a table or view.
- A Variable Transformation function applies requested data mining transformation functions to the columns in a single table or view and creates a transformed table.
- A Build ADS analysis joins together the tables or views created by one or more Variable Creation and/or Variable Transformation functions, allowing column selection and the application of expert where clause constraints. These functions can also be performed by a Variable Creation or Join function, but Build ADS is simpler than a Variable Creation and, unlike Join, can operate on a single table.
- A Refresh analysis is provided to allow re-executing a chain of ADS and/or Reorganization analyses with temporary variations in various parameter values.