7.00.02 - Output for Unnormalized Input - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Release Date
September 2017
Content Type
Programming Reference
User Guide
Publication ID
B700-1022-700K
Language
English (United States)

If you run the model without first normalizing the input, you get the output in the following table.

PCA Example Output Table pca_health_ev (Columns 1-5)
component_rank age bmi bloodpressure glucose
1 0.0233531960338227 0.00504956503997912 0.00158669391367759 0.0842032256610642
2 0.0401859758008148 0.0404126752942246 -0.0223449106172612 0.217685269075731
3 -0.16820726382671 0.0638054862550787 -0.472824089141384 -0.820766253634683
4 -0.0129652073386729 0.00378045426074681 -0.858379856202759 0.406899030303581
5 -0.233598151299198 0.320787735507453 -0.124354522206257 0.309945063989336
6 -0.743337103992336 0.515142674213825 0.153195632015632 0.0282396614323736
7 -0.585637274413613 -0.7907071584919 0.00553007147741568 0.0925491110066536
8 -0.13888402260094 -0.0275921930861039 0.0124923981893404 -0.0294977512854948
PCA Example Output Table pca_health_ev (Columns 6-9)
strokes cigarettes insulin hdl
-0.00859257029795621 0.0413329219482391 0.99526340917443 0.00222103039038422
-0.00344527374716509 0.0171848390743044 -0.022445679214246 0.973681022974589
-0.0539753477374444 0.180457886400629 0.0654659811769922 0.175074647858464
0.0391336176479417 -0.290464771196449 -0.0201355968289739 -0.105490035413321
-0.0568956998524478 0.846124412456638 -0.0575951508612638 -0.0922837968094822
-0.070564649365954 -0.389709106729591 0.0277394005213322 0.0137681856551496
-0.107921723264969 0.101814040327821 0.00467778052427058 0.0343536591559544
0.987727660277147 0.0538110654780203 0.0121302727639756 0.016583625709794
PCA Example Output Table pca_health_ev (Columns 10-13)
sd var_proportion cumulative_var mean
194.717385062003 0.903560729160365 0.903560729160365 [37.88, 31.96399963378906, 66.8, 130.84, 5.12, 18.6, 108.16, 49.17200023651123]
46.3369853980967 0.0511685890757095 0.954729318236074  
32.126518678328 0.0245966082061308 0.979325926442205  
23.0594563502538 0.0126720249199549 0.99199795136216  
14.4734867683913 0.00499222587415763 0.996990177236318  
8.8921277044859 0.00188434002234855 0.998874517258666  
6.5381934611736 0.00101874015287284 0.999893257411539  
2.11638619509984 0.00010674258846095 1  

The loadings for each principal component are different. For example, for the first principal component, which captures the most variance among input space, insulin has a much larger loading than every other variable. This is because the raw insulin values, ranging from 70 to 190, are much larger in magnitude than the values for most of the other variables (for example, BMI is usually 18 to 30).

Teradata usually recommends normalization when variables have different variances. However, in some datasets, the magnitudes of input variables do matter, and normalization is not recommended.