Background - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product
Aster Analytics
Release Number
6.21
Published
November 2016
Language
English (United States)
Last Update
2018-04-14
dita:mapPath
kiu1466024880662.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1021
lifecycle
previous
Product Category
Software

The purpose of the Scale function is to normalize the input data set. The function shifts the input data and scales it to generate the normalized values.

Scaling is a preprocessing step for many data sets. Many data sets consist of variables that are measured in different units, or have very different ranges or variances. If the data set is not scaled, some columns may have a much greater influence on the results than others, which can produce misleading results. Some functions to consider for scaling are KMeans, Principal Component Analysis (PCA) and VARMAX.

For example, an insurance company wants to use the KMeans function to cluster customers according to their house data, which includes square feet (sqft), number of rooms (numrooms), and price. As the following table shows, the numeric values of these variables differ significantly because they are measured in different units. If the input data is not scaled, the effect of house price dominates the clustering due to its larger variance.

Input Data Example
id sqft numrooms price
1 1000 3 200,000
2 1500 4 300,000
3 500 2 150,000

The following table shows the data after normalization with the Scale function using the MAXABS method. All three columns are now on a similar scale and price no longer dominates the KMeans clustering.

Output Table Example
id sqft numrooms price
1 0.667 0.75 0.667
2 1 1 1
3 0.333 0.5 0.5