This example uses the patient_pca_input data set and the variables age, bmi, bloodpressure, glucose, cigarettes, insulin, and hdl to predict the number of strokes for a patient.
The following statement creates the table pca_health_ev2 from a query that conducts a principal component analysis on the other seven variables:
CREATE TABLE pca_health_ev2 DISTRIBUTE BY REPLICATION AS SELECT * FROM pca_reduce ( ON pca_map ( ON pca_scaled TargetColumns ('age', 'bmi', 'bloodpressure', 'glucose', 'cigarettes', 'insulin', 'hdl') ) PARTITION BY 1 ) ORDER BY component_rank;
The preceding query uses the same process as query in the PCA example.
The following query returns the table pca_health_ev2:
SELECT * FROM pca_health_ev2;
Because strokes is the response variable, pca_health_ev2 has no strokes column; otherwise, pca_health_ev2 is the same as the output table of the PCA example, pca_health_ev_scaled.
component_rank | age | bmi | bloodpressure | glucose |
---|---|---|---|---|
1 | 0.558383429424359 | -0.0605952043265538 | 0.279191393011342 | 0.562811446671035 |
2 | -0.250537483690061 | 0.614686329749836 | -0.258383085318548 | -0.0474790977051386 |
3 | -0.0911563605166551 | -0.141748695621042 | 0.251924555111245 | -0.220162314985357 |
4 | -0.0965156342595097 | 0.203200180598669 | 0.843333872842729 | -0.0927285011581401 |
5 | -0.180835348784396 | 0.599972991222924 | 0.079755498943645 | 0.485460834090224 |
6 | 0.741419465051456 | 0.431342026006021 | -0.0814633949974965 | -0.452698251899937 |
7 | -0.159404889398745 | 0.105272966756617 | 0.260029076069335 | -0.428148411420131 |
cigarettes | insulin | hdl |
---|---|---|
0.156485831353259 | 0.497004566613813 | 0.135389267935288 |
0.579594554667163 | 0.305498725264089 | 0.247217076371136 |
0.392249448405349 | 0.284736376332311 | -0.790396081112643 |
0.201301142840861 | -0.36378506450842 | 0.238164018417497 |
-0.419465995079716 | -0.0691208326400593 | -0.42961426285989 |
-0.0606741267486036 | -0.16172687758833 | -0.151103636140504 |
-0.51533240361692 | 0.644444986031744 | 0.178058065989532 |
sd | var_proportion | cumulative_var | mean |
---|---|---|---|
1.42680503704651 | 0.302942353235311 | 0.302942353235311 | [-2.753353101070388E-16, 8.881784197001253E-17, 1.0658141036401502E-16, -7.993605777301127E-17, -1.3322676295501878E-17, 7.771561172376097E-18, -1.7763568394002505E-17] |
1.27431560304133 | 0.241648847642051 | 0.544591200877363 | |
1.01834410187873 | 0.154319153248689 | 0.698910354126052 | |
0.92244700110317 | 0.126623284203011 | 0.825533638329062 | |
0.780266259616559 | 0.0905975351035737 | 0.916131173432636 | |
0.573696298386201 | 0.0489772980330401 | 0.965108471465676 | |
0.484222130587456 | 0.0348915285343237 | 1 |