Function transformations play a critical role in the machine learning pipeline. These transformations involve applying mathematical functions to columns of data to create new variables that can help improve the accuracy and robustness of machine learning models.
Some of the most commonly used transformations include absolute, log, exponential, ceil, floor, sigmoid, and tanh transformations. These transformations serve a variety of purposes, such as:
- Data normalization: Transformations can normalize the data by transforming it into a more standardized form. For example, taking the log of a variable can scale down larger values while keeping smaller values relatively unchanged. Additionally, sigmoid and tanh functions be used to map the data into a range between 0 and 1 or -1 and 1, respectively. This reduces the impact of outliers and makes the data more consistent.
- Handling non-linear relationships: In some cases, the relationship between a variable and the target variable may not be linear. Transforming the variable using a non-linear transformation (such as the log or exponential functions) can capture these non-linear relationships and improve the accuracy of the model.
- Mitigating skewness: Skewed data can cause issues for machine learning algorithms, particularly those that assume a normal distribution. Log, power, sigmoid or tanh transformations can balance imbalanced data by compressing the values of the majority class towards the center of the distribution and expanding the values of the minority class towards the tails of the distribution. This can improve the performance of machine learning algorithms that are sensitive to skewed data.
- Addressing heteroscedasticity: In some cases, the variance of a variable may not be constant across different levels of the variable. Transforming the variable using log function can sometimes address this heteroscedasticity and improve the accuracy of the model.
By transforming columns of data into new variables that capture important information, machine learning models can make better predictions and achieve better performance across a wide range of applications.
TD_FunctionTransform applies numeric transformations to input columns, using TD_FunctionFit output.