Input
The input table, patient_pca_input, contains medical data for 25 patients, identified by patient ID (pid).
Data attributes:
- Age (years)
- Body mass index (BMI) (kg/m2)
- Blood pressure (mm Hg)
- Blood glucose level (mg/dL)
- Strokes (number experienced)
- Cigarettes (number smoked/month)
- Insulin (mg/dL)
- High-density lipoproteins (HDL) (mg/dL)
pid | age | bmi | bloodpressure | glucose | strokes | cigarettes | insulin | hdl |
---|---|---|---|---|---|---|---|---|
1 | 50 | 33.6 | 72 | 148 | 6 | 35 | 0 | 62.7 |
2 | 31 | 26.6 | 66 | 85 | 1 | 29 | 0 | 35.1 |
3 | 32 | 23.3 | 64 | 183 | 8 | 0 | 0 | 67.2 |
4 | 21 | 28.1 | 66 | 89 | 1 | 23 | 94 | 16.7 |
5 | 33 | 43.1 | 40 | 137 | 0 | 35 | 168 | 228.8 |
6 | 30 | 25.6 | 74 | 116 | 5 | 0 | 0 | 20.1 |
7 | 26 | 31 | 50 | 78 | 3 | 32 | 88 | 24.8 |
8 | 29 | 35.3 | 0 | 115 | 10 | 0 | 0 | 13.4 |
9 | 53 | 30.5 | 70 | 197 | 2 | 45 | 543 | 15.8 |
10 | 54 | 0 | 96 | 125 | 8 | 0 | 0 | 23.2 |
11 | 30 | 37.6 | 92 | 110 | 4 | 0 | 0 | 19.1 |
12 | 34 | 38 | 74 | 168 | 10 | 0 | 0 | 53.7 |
13 | 57 | 27.1 | 80 | 139 | 10 | 0 | 0 | 144.1 |
14 | 59 | 30.1 | 60 | 189 | 1 | 23 | 846 | 39.8 |
15 | 51 | 25.8 | 72 | 166 | 5 | 19 | 175 | 58.7 |
16 | 32 | 30 | 0 | 100 | 7 | 0 | 0 | 48.4 |
17 | 31 | 45.8 | 84 | 118 | 0 | 47 | 230 | 55.1 |
18 | 31 | 29.6 | 74 | 107 | 7 | 0 | 0 | 25.4 |
19 | 33 | 43.3 | 30 | 103 | 1 | 38 | 83 | 18.3 |
20 | 32 | 34.6 | 70 | 115 | 1 | 30 | 96 | 52.9 |
21 | 27 | 39.3 | 88 | 126 | 3 | 41 | 235 | 70.4 |
22 | 50 | 35.4 | 84 | 99 | 8 | 0 | 0 | 38.8 |
23 | 41 | 39.8 | 90 | 196 | 7 | 0 | 0 | 45.1 |
24 | 29 | 29 | 80 | 119 | 9 | 35 | 0 | 26.3 |
25 | 51 | 36.6 | 94 | 143 | 11 | 33 | 146 | 25.4 |
Normalized Input
The reason to normalize the input variables before inputting them to the PCA function is that if some variables have much larger variance than others, they dominate in the first few principal components.
The following commands use the Scale and ScaleMap functions to create a table of normalized input variables, pca_scaled, from the patient_pca_input data set:
CREATE MULTISET TABLE tmpScale AS ( SELECT * FROM ScaleMap ( ON patient_pca_input USING InputColumns ('[1:8]') MissValue ('omit') ) AS dt ) WITH DATA;
CREATE MULTISET TABLE pca_scaled AS ( SELECT * FROM Scale ( ON patient_pca_input AS InputTable PARTITION BY ANY ON tmpScale AS statistic DIMENSION USING ScaleMethod ('std') Accumulate ('pid') ) AS dt ) WITH DATA;
SQL Calls
This query conducts a principal component analysis on the normalized input data:
SELECT * FROM PCAReduce ( ON (SELECT * FROM PCAMap ( ON pca_scaled USING TargetColumns('[1:8]') ) AS dt1 ) PARTITION BY 1 ) AS dt;
Output
component_rank age bmi bloodpressure glucose strokes cigarettes insulin hdl sd var_proportion cumulative_var mean -------------- -------------------- ------------------- -------------------- -------------------- -------------------- --------------------- -------------------- --------------------- ------------------- -------------------- ------------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 0.08209456624137079 -0.3870046090920526 0.09345439689066688 -0.04216593563916513 0.5531712742637565 -0.5556501224007842 -0.43685302464081466 -0.1654996736437919 1.472109758037739 0.28217540881639736 0.28217540881639736 [-2.486899575160351E-16, -6.661338147750939E-16, 9.769962616701378E-17, -1.0658141036401502E-16, 1.7763568394002505E-17, -5.329070518200751E-17, 0.0, -5.329070518200751E-17] 2 -0.6028733018225988 0.17861430770359402 -0.31903125250345443 -0.5700201930411378 -0.11893913442095008 -0.005196208048840971 -0.3943619845210589 -0.09078467587773241 1.4241934069709463 0.2641050599556657 0.546280468772063 NULL 3 0.02374359397742567 0.39186196324224787 -0.30591120403327365 0.23782383985742445 0.2173850060727078 -0.1967429451346994 -0.1922943077460684 0.7568338469137748 1.049099321858834 0.1433085139485242 0.6895889827205872 NULL 4 -0.09998801486200944 0.21435051698676735 0.8444287615246641 -0.0918928665696382 0.011305174477644377 0.2111352882378259 -0.3579033088614018 0.2230500507326031 0.922502599277917 0.1108087299055356 0.8003977126261228 NULL 5 -0.14085382468973354 0.6716168063561156 0.015501081352144553 0.3351077534037482 0.34696752296785255 -0.038650465429363866 0.043038964135395014 -0.5411292846166164 0.8339617788814034 0.09055888654102022 0.890956599167143 NULL 6 0.5273985574183678 0.1083454629714406 -0.19613416785828364 -0.41491662766966964 0.4324034630774445 0.5429592283215245 -0.12781293754536002 -0.03706719659550343 0.6386214541768617 0.053103823142574176 0.9440604223097172 NULL 7 0.4246146518729586 0.3900261133262609 0.13557114512538138 -0.47994711699092446 -0.2574558219181479 -0.557165978164135 0.20322505229037413 -0.028903434023275044 0.5350610892169184 0.0372773918221348 0.981337814131852 NULL 8 0.37544142271498704 0.0624033352765424 -0.16048460010093996 0.30576508343845227 -0.508751667006649 0.034749606866136186 -0.6550398914761275 -0.21630745584000433 0.37858365979975517 0.018662185868148023 1.0 NULL
Output for Unnormalized Input
If you run the model without first normalizing the input, you get the output in the following table.
The loadings for each principal component are different. For example, for the first principal component, which captures the most variance among input space, insulin has a much larger loading than every other variable. This is because the raw insulin values, ranging from 70 to 190, are much larger in magnitude than the values for most of the other variables (for example, BMI is usually 18 to 30).
Teradata usually recommends normalization when variables have different variances. However, in some data sets, the magnitudes of input variables do matter, and normalization is not recommended.
component_rank age bmi bloodpressure glucose strokes cigarettes insulin hdl sd var_proportion cumulative_var mean -------------- --------------------- ---------------------- --------------------- -------------------- ---------------------- --------------------- --------------------- ---------------------- ------------------ --------------------- ------------------ --------------------------------------------------------------------- 1 -0.023353196034411227 -0.0050495648788435234 -0.001586693912675234 -0.08420322566619312 0.00859257029825338 -0.04133292194971656 -0.9952634091735024 -0.0022210309439499357 194.71738506207055 0.9035607310846709 0.9035607310846709 [37.88, 31.964000000000002, 66.8, 130.84, 5.12, 18.6, 108.16, 49.172] 2 0.04018597618816406 0.04041268203719046 -0.02234490883189587 0.21768528557138409 -0.0034452736055330785 0.01718483933772813 -0.022445681193031374 0.9736810189820199 46.33698444340015 0.05116858707616014 0.954729318160831 NULL 3 0.16820726416678253 -0.06380548610918492 0.47282409285264465 0.8207662481650843 0.05397534805226847 -0.18045788796064954 -0.06546598054699179 -0.17507466173441805 32.12651862361091 0.024596608174712386 0.9793259263355433 NULL 4 -0.012965205864271578 0.0037804534935903873 -0.8583798540505494 0.40689903096324975 0.03913361827590789 -0.29046477452424657 -0.020135596705586265 -0.10549004121778502 23.059456330930484 0.01267202492569593 0.9919979512612392 NULL 5 -0.23359815217156288 0.32078777092180283 -0.12435451993064969 0.3099450633949961 -0.056895698485650424 0.8461243985086968 -0.05759515026265747 -0.09228380566106692 14.473486862384386 0.004992225949626681 0.9969901772108658 NULL 6 0.7433370943722172 -0.5151426717933871 -0.15319563481636883 -0.02823965187653383 0.07056464778179963 0.38970912809735386 -0.027739402093030042 -0.01376818417563162 8.892127729665095 0.0018843400370318058 0.9988745172478976 NULL 7 0.585637286534736 0.7907071454804706 -0.005530068854870849 -0.09254912263578012 0.10792172202547658 -0.10181406223812656 -0.004677778713769677 -0.0343536603012951 6.538193488247754 0.0010187401634787858 0.9998932574113765 NULL 8 0.13888402112454176 0.02759219001059113 -0.012492398266161555 0.029497751534728455 -0.9877276605628905 -0.053811065310627085 -0.01213027272971143 -0.01658362623945359 2.11638619445855 1.067425886235164E-4 1.0 NULL
Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.