Background - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product

Aster Analytics

Release Number

6.21

Published

November 2016

Language

English (United States)

Last Update

2018-04-14

dita:mapPath

kiu1466024880662.ditamap

dita:ditavalPath

AA-notempfilter_pdf_output.ditaval

dita:id

B700-1021

lifecycle

Product Category

Software

The purpose of the Scale function is to normalize the input data set. The function shifts the input data and scales it to generate the normalized values.

Scaling is a preprocessing step for many data sets. Many data sets consist of variables that are measured in different units, or have very different ranges or variances. If the data set is not scaled, some columns may have a much greater influence on the results than others, which can produce misleading results. Some functions to consider for scaling are KMeans, Principal Component Analysis (PCA) and VARMAX.

For example, an insurance company wants to use the KMeans function to cluster customers according to their house data, which includes square feet (sqft), number of rooms (numrooms), and price. As the following table shows, the numeric values of these variables differ significantly because they are measured in different units. If the input data is not scaled, the effect of house price dominates the clustering due to its larger variance.

Input Data Example
id	sqft	numrooms	price
1	1000	3	200,000
2	1500	4	300,000
3	500	2	150,000

The following table shows the data after normalization with the Scale function using the MAXABS method. All three columns are now on a similar scale and price no longer dominates the KMeans clustering.

Output Table Example
id	sqft	numrooms	price
1	0.667	0.75	0.667
2	1	1	1
3	0.333	0.5	0.5