TD_ScaleFit Function | ScaleFit | Teradata Vantage - TD_ScaleFit - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

TD_ScaleFit outputs a table of statistics to input to TD_ScaleTransform, which scales specified input table columns. TD_ScaleFit accepts input data in dense and sparse format.

Scale fit refers to the process of transforming variables in a dataset to have a particular scale or range of values that is more suitable for analysis or modeling. This is important because variables in a dataset may have different units of measurement and/or different ranges of values, which can lead to bias in the analysis or model performance.

There are several methods of scaling variables in a dataset, including:
  • Min-Max scaling: This method scales variables to a range between 0 and 1. The formula for min-max scaling is:

    scaled value=(value-min⁡ )/(max-min⁡ )

    where min is the minimum value of the variable in the dataset, max is the maximum value of the variable in the dataset, and value is the original value of the variable.

  • Standardization: This method scales variables to have a mean of 0 and a standard deviation of 1. The formula for standardization is:

    scaled value=(value-mean)/(standard deviation)

    where mean is the mean value of the variable in the dataset, standard_deviation is the standard deviation of the variable in the dataset, and value is the original value of the variable.

  • Logarithmic scaling: This method transforms variables by taking the natural logarithm of their values. This can be useful for variables that have a wide range of values or are heavily skewed.
  • Power transformation: This method transforms variables by raising them to a power. Common power transformations include the square root, cube root, and inverse transformations.
  • Robust scaling: This method scales variables to have a median of 0 and a range between the 25th and 75th percentiles. This can be useful for variables that have extreme outliers.

It's important to note that the choice of scaling method depends on the specific dataset and analysis or modeling goals. Some methods may work better for certain types of variables or distributions than others. Additionally, it's important to evaluate the impact of scaling on the analysis or model performance to ensure that the scaling method is appropriate for the data.

Usage Considerations

The following are usage considerations for TD_ScaleFit function:

ON clause

  • The InputTable in the TD_ScaleFit query can have no partition at all or have the following combinations of PARTITION BY/ORDER BY clauses.
    • PARTITION BY ANY ORDER BY
    • PARTITION BY ANY
    • PARTITION BY KEY
    • PARTITION BY KEY ORDER BY
  • The ParameterTable and AttributeTable in TD_ScaleFit query can have partition with following combinations:
    • PARTITION BY KEY
    • PARTITION BY KEY ORDER BY
  • TargetColumns and ScaleMethod arguments are mandatory for TD_ScaleFit function. Otherwise an error is reported. For example: “Failure 7810 Error in function TD_ScaleFit: Required argument TargetColumns is missing.”
  • If IgnoreInvalidLocationScale is set to false, and ScaleMethod is set to STD, then the TD_ScaleFit function will report an error if standard deviation of the column is 0. For example: "*** Failure 9134 Error in function TD_ScaleFit: Cannot apply STD ScaleMethod on columns with STD = 0."
  • If the TD_ScaleFit and TD_ScaleTransform functions are invoked on data with large partition sizes and high concurrency, the spool space for the database user may exhaust during partitioning of data and throw one of the following errors:
    • *** Failure 9794 File system has reported ERRAMPOUTOFPHYSPACE error.
    • *** Failure 2646 No more spool space in database_user.

Limits and Restrictions

  • This function requires the UTF8 client character set for UNICODE data.
  • This function does not support Pass Through Characters (PTCs) and KanjiSJIS or Graphic data types.