TD_SimpleImputeFit Function | SimpleImputeFit| Teradata Vantage - TD_SimpleImputeFit - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-10-04
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantageā„¢

TD_SimpleImputeFit is a data cleaning function that assigns missing values in a dataset. When working with real-world data, it is not uncommon to have missing values in some of the variables. This can cause problems when trying to perform data analysis or modeling. In such cases, it may be necessary to impute the missing values with plausible estimates to ensure that the data is complete and suitable for further analysis.

The TD_SimpleImputeFit function replaces the missing values with the mean, median, or most frequent value of the feature, depending on the imputation strategy selected. The function can also handle numeric and categoric features, and can be used with machine-learning models.

For example, you have a dataset of patient records, including variables such as age, gender, blood pressure, and cholesterol levels. However, due to some missing values in the dataset, some patients' records are incomplete. The blood pressure variable is missing for some patients, while the cholesterol levels variable is missing for others. This can cause problems when trying to perform data analysis or modeling on the dataset.

To address this issue, use the TD_SimpleImputeFit function to assign the missing values with plausible estimates. You can impute missing values in the systolic blood pressure with the mean value of all patients in the dataset. Similarly, you can impute missing values in the cholesterol levels variable with the median value of the cholesterol levels of all patients in the dataset.

TD_SimpleImputeFit outputs a table of values to substitute for missing values in the input table. The output table is input to TD_SimpleImputeTransform, which makes the substitutions.