Output for Unnormalized Input - Aster Analytics

Teradata AsterĀ® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Published
September 2017
Language
English (United States)
Last Update
2018-04-17
dita:mapPath
uce1497542673292.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1022
lifecycle
previous
Product Category
Software

If you run the model without first normalizing the input, you get the output in the following table.

PCA Example Output Table pca_health_ev (Columns 1-5)
component_rank age bmi bloodpressure glucose
1 0.0233531960338227 0.00504956503997912 0.00158669391367759 0.0842032256610642
2 0.0401859758008148 0.0404126752942246 -0.0223449106172612 0.217685269075731
3 -0.16820726382671 0.0638054862550787 -0.472824089141384 -0.820766253634683
4 -0.0129652073386729 0.00378045426074681 -0.858379856202759 0.406899030303581
5 -0.233598151299198 0.320787735507453 -0.124354522206257 0.309945063989336
6 -0.743337103992336 0.515142674213825 0.153195632015632 0.0282396614323736
7 -0.585637274413613 -0.7907071584919 0.00553007147741568 0.0925491110066536
8 -0.13888402260094 -0.0275921930861039 0.0124923981893404 -0.0294977512854948
PCA Example Output Table pca_health_ev (Columns 6-9)
strokes cigarettes insulin hdl
-0.00859257029795621 0.0413329219482391 0.99526340917443 0.00222103039038422
-0.00344527374716509 0.0171848390743044 -0.022445679214246 0.973681022974589
-0.0539753477374444 0.180457886400629 0.0654659811769922 0.175074647858464
0.0391336176479417 -0.290464771196449 -0.0201355968289739 -0.105490035413321
-0.0568956998524478 0.846124412456638 -0.0575951508612638 -0.0922837968094822
-0.070564649365954 -0.389709106729591 0.0277394005213322 0.0137681856551496
-0.107921723264969 0.101814040327821 0.00467778052427058 0.0343536591559544
0.987727660277147 0.0538110654780203 0.0121302727639756 0.016583625709794
PCA Example Output Table pca_health_ev (Columns 10-13)
sd var_proportion cumulative_var mean
194.717385062003 0.903560729160365 0.903560729160365 [37.88, 31.96399963378906, 66.8, 130.84, 5.12, 18.6, 108.16, 49.17200023651123]
46.3369853980967 0.0511685890757095 0.954729318236074  
32.126518678328 0.0245966082061308 0.979325926442205  
23.0594563502538 0.0126720249199549 0.99199795136216  
14.4734867683913 0.00499222587415763 0.996990177236318  
8.8921277044859 0.00188434002234855 0.998874517258666  
6.5381934611736 0.00101874015287284 0.999893257411539  
2.11638619509984 0.00010674258846095 1  

The loadings for each principal component are different. For example, for the first principal component, which captures the most variance among input space, insulin has a much larger loading than every other variable. This is because the raw insulin values, ranging from 70 to 190, are much larger in magnitude than the values for most of the other variables (for example, BMI is usually 18 to 30).

Teradata usually recommends normalization when variables have different variances. However, in some datasets, the magnitudes of input variables do matter, and normalization is not recommended.