If you run the model without first normalizing the input, you get the output in the following table.
component_rank | age | bmi | bloodpressure | glucose |
---|---|---|---|---|
1 | 0.0233531960338227 | 0.00504956503997912 | 0.00158669391367759 | 0.0842032256610642 |
2 | 0.0401859758008148 | 0.0404126752942246 | -0.0223449106172612 | 0.217685269075731 |
3 | -0.16820726382671 | 0.0638054862550787 | -0.472824089141384 | -0.820766253634683 |
4 | -0.0129652073386729 | 0.00378045426074681 | -0.858379856202759 | 0.406899030303581 |
5 | -0.233598151299198 | 0.320787735507453 | -0.124354522206257 | 0.309945063989336 |
6 | -0.743337103992336 | 0.515142674213825 | 0.153195632015632 | 0.0282396614323736 |
7 | -0.585637274413613 | -0.7907071584919 | 0.00553007147741568 | 0.0925491110066536 |
8 | -0.13888402260094 | -0.0275921930861039 | 0.0124923981893404 | -0.0294977512854948 |
strokes | cigarettes | insulin | hdl |
---|---|---|---|
-0.00859257029795621 | 0.0413329219482391 | 0.99526340917443 | 0.00222103039038422 |
-0.00344527374716509 | 0.0171848390743044 | -0.022445679214246 | 0.973681022974589 |
-0.0539753477374444 | 0.180457886400629 | 0.0654659811769922 | 0.175074647858464 |
0.0391336176479417 | -0.290464771196449 | -0.0201355968289739 | -0.105490035413321 |
-0.0568956998524478 | 0.846124412456638 | -0.0575951508612638 | -0.0922837968094822 |
-0.070564649365954 | -0.389709106729591 | 0.0277394005213322 | 0.0137681856551496 |
-0.107921723264969 | 0.101814040327821 | 0.00467778052427058 | 0.0343536591559544 |
0.987727660277147 | 0.0538110654780203 | 0.0121302727639756 | 0.016583625709794 |
sd | var_proportion | cumulative_var | mean |
---|---|---|---|
194.717385062003 | 0.903560729160365 | 0.903560729160365 | [37.88, 31.96399963378906, 66.8, 130.84, 5.12, 18.6, 108.16, 49.17200023651123] |
46.3369853980967 | 0.0511685890757095 | 0.954729318236074 | |
32.126518678328 | 0.0245966082061308 | 0.979325926442205 | |
23.0594563502538 | 0.0126720249199549 | 0.99199795136216 | |
14.4734867683913 | 0.00499222587415763 | 0.996990177236318 | |
8.8921277044859 | 0.00188434002234855 | 0.998874517258666 | |
6.5381934611736 | 0.00101874015287284 | 0.999893257411539 | |
2.11638619509984 | 0.00010674258846095 | 1 |
The loadings for each principal component are different. For example, for the first principal component, which captures the most variance among input space, insulin has a much larger loading than every other variable. This is because the raw insulin values, ranging from 70 to 190, are much larger in magnitude than the values for most of the other variables (for example, BMI is usually 18 to 30).
Teradata usually recommends normalization when variables have different variances. However, in some datasets, the magnitudes of input variables do matter, and normalization is not recommended.