TargetEncoding generally uses the likelihood or expected value of the target variable for each category and encodes that category with that value. This technique works for both binary classification and regression and for multiclass classification a similar technique is applied, which encodes the categorical variable with k new variables, where k is the number of classes.
The TD_TargetEncodingFit function takes the InputTable and a CategoricalTable as input and generates the required hyperparameters, which will be used by the TD_TargetEncodingTransform function for encoding the categorical values.
- This function requires the UTF8 client character set.
- This function does not support Pass-Through Characters (PTCs).
- This function does not support KanjiSJIS or Graphic data types.
- The maximum number of unique categories in the particular column is 4000.
- The maximum category length is 128 characters.
- Columns with a large number of distinct categories can have an impact on query execution time.