Output for Unnormalized Input

Output for Unnormalized Input - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product

Aster Analytics

Release Number

7.00.02

Published

September 2017

Language

English (United States)

Last Update

2018-04-17

dita:mapPath

uce1497542673292.ditamap

dita:ditavalPath

AA-notempfilter_pdf_output.ditaval

dita:id

B700-1022

lifecycle

Product Category

Software

If you run the model without first normalizing the input, you get the output in the following table.

PCA Example Output Table pca_health_ev (Columns 1-5)
component_rank	age	bmi	bloodpressure	glucose
1	0.0233531960338227	0.00504956503997912	0.00158669391367759	0.0842032256610642
2	0.0401859758008148	0.0404126752942246	-0.0223449106172612	0.217685269075731
3	-0.16820726382671	0.0638054862550787	-0.472824089141384	-0.820766253634683
4	-0.0129652073386729	0.00378045426074681	-0.858379856202759	0.406899030303581
5	-0.233598151299198	0.320787735507453	-0.124354522206257	0.309945063989336
6	-0.743337103992336	0.515142674213825	0.153195632015632	0.0282396614323736
7	-0.585637274413613	-0.7907071584919	0.00553007147741568	0.0925491110066536
8	-0.13888402260094	-0.0275921930861039	0.0124923981893404	-0.0294977512854948

PCA Example Output Table pca_health_ev (Columns 6-9)
strokes	cigarettes	insulin	hdl
-0.00859257029795621	0.0413329219482391	0.99526340917443	0.00222103039038422
-0.00344527374716509	0.0171848390743044	-0.022445679214246	0.973681022974589
-0.0539753477374444	0.180457886400629	0.0654659811769922	0.175074647858464
0.0391336176479417	-0.290464771196449	-0.0201355968289739	-0.105490035413321
-0.0568956998524478	0.846124412456638	-0.0575951508612638	-0.0922837968094822
-0.070564649365954	-0.389709106729591	0.0277394005213322	0.0137681856551496
-0.107921723264969	0.101814040327821	0.00467778052427058	0.0343536591559544
0.987727660277147	0.0538110654780203	0.0121302727639756	0.016583625709794

PCA Example Output Table pca_health_ev (Columns 10-13)
sd	var_proportion	cumulative_var	mean
194.717385062003	0.903560729160365	0.903560729160365	[37.88, 31.96399963378906, 66.8, 130.84, 5.12, 18.6, 108.16, 49.17200023651123]
46.3369853980967	0.0511685890757095	0.954729318236074
32.126518678328	0.0245966082061308	0.979325926442205
23.0594563502538	0.0126720249199549	0.99199795136216
14.4734867683913	0.00499222587415763	0.996990177236318
8.8921277044859	0.00188434002234855	0.998874517258666
6.5381934611736	0.00101874015287284	0.999893257411539
2.11638619509984	0.00010674258846095	1

The loadings for each principal component are different. For example, for the first principal component, which captures the most variance among input space, insulin has a much larger loading than every other variable. This is because the raw insulin values, ranging from 70 to 190, are much larger in magnitude than the values for most of the other variables (for example, BMI is usually 18 to 30).

Teradata usually recommends normalization when variables have different variances. However, in some datasets, the magnitudes of input variables do matter, and normalization is not recommended.