One aspect of creating an analytic data set to be used as input to a data mining algorithm is the transformation of variables into a format useful to the algorithm. In general, transformations that are reasonably performed as part of SQL expressions have been included in the Variable Creation function, whereas transformations that require a more elaborate SQL structure are provided in the Variable Transformation function. Specifically, transformations in the Variable Transformation function may require calculating global aggregates or more complex measures in derived tables, or may include a separate null replacement transformation as a preprocessing step using a preliminary volatile table. Variable Transformation is however limited to operating on a single input table.
The Variable Transformation function makes it possible to specify at one time any mixture of transformations for any number of columns in a single input table. The user may also specify that columns from the input table be retained unchanged, or retained with a different name and/or type. The result is a new table or view based on the same or transformed columns from the input table.
- Bin Code
- Design Code
- Null Replacement
- Z Score
In order to use the Variable Transformation analysis, the user selects a single input table and then, on a column by column basis, selects what transformation or action they want to perform, if any. The user can choose any of the offered transformations and/or a simple copy or Retain operation. That is, they can choose to include any input table column, as is or with a different name or type, in the output table, whether or not they choose to transform it. By default, the result column name is the same as the input column name, unless multiple result columns may result (as with the design coding transformation). If a specific type is specified, it results in casting the retained column or transformed column.
As a convenience, the primary index columns of the input table are automatically included as retained columns on the transformations screen when the first column is selected for transformation. It is the user’s responsibility, however, to ensure that the necessary “primary key” columns (which might be different) are retained or otherwise provided in the output table if it is desired that the output rows can be uniquely identified.
The user can also specify that a null transformation be performed in a preprocessing step prior to the requested transformation. In this case, the null transformation is produced in a volatile table that is then automatically referenced by the generated SQL, both by the transformation SQL and by any derived aggregates the transformation may require. In this case, it is necessary to indicate the primary key columns of the input table if they are different than the primary index columns or if the primary index columns cannot be determined.
It is possible that the user may specify more transformations than can be performed in a single analysis, due either to complexity or the number of output columns. If this is the case, the user must split up the transformations into multiple analyses and join them back together with a Variable Creation, Build ADS or Join analysis.
Note that the Variable Transformation analysis allows only one input table and does not support the idea of an Anchor Table or Anchor Columns that determine the key values in the output table. In certain scenarios, however, the Anchor Table concept is used to represent the Variable Transformation input table in order to allow changing it in a Refresh or Publish analysis. For example, the Refresh analysis has a Modify Anchor Table option that automatically changes a Variable Transformation input table to the selected value, but only if it is not an Analysis Table, i.e., the output of a chained analysis. In another scenario, a Refresh analysis may be used to create a stored procedure with parameters, including parameters for Anchor Database and Table, representing the input table of a Variable Transformation analysis.
When using the Publish analysis to publish a Variable Transformation analysis to the Model Manager or AppCenter application, tags representing the Anchor Database and Anchor Table are placed in the generated SQL in place of the input database and table provided the input table is not an Analysis Table. The Model Manager or AppCenter user can then dynamically change the input database and table by configuring the Anchor Database and Anchor Table parameters accordingly.