Input
The following statement creates the input table, pca_health_ev2, from the table pca_scaled (created in PCA Example), using only the variables age, bmi, bloodpressure, glucose, cigarettes, insulin, and hdl:
CREATE MULTISET TABLE pca_health_ev2 AS ( SELECT * FROM PCAReduce ( ON PCAMap ( ON pca_scaled USING TargetColumns ('age','bmi','bloodpressure','glucose','cigarettes', 'insulin','hdl') ) PARTITION BY 1 ) AS dt2 ) WITH DATA;
This query returns the following table:
SELECT * FROM pca_health_ev2;
component_rank | age | bmi | bloodpressure | glucose | cigarettes | insulin | hdl | sd | var_proportion | cumulative_var | mean |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0.558383429424359 | -0.0605952043265538 | 0.279191393011342 | 0.562811446671035 | 0.156485831353259 | 0.497004566613813 | 0.135389267935288 | 1.42680503704651 | 0.302942353235311 | 0.302942353235311 | [-2.753353101070388E-16, 8.881784197001253E-17, 1.0658141036401502E-16, -7.993605777301127E-17, -1.3322676295501878E-17, 7.771561172376097E-18, -1.7763568394002505E-17] |
2 | -0.250537483690061 | 0.614686329749836 | -0.258383085318548 | -0.0474790977051386 | 0.579594554667163 | 0.305498725264089 | 0.247217076371136 | 1.27431560304133 | 0.241648847642051 | 0.544591200877363 | |
3 | -0.0911563605166551 | -0.141748695621042 | 0.251924555111245 | -0.220162314985357 | 0.392249448405349 | 0.284736376332311 | -0.790396081112643 | 1.01834410187873 | 0.154319153248689 | 0.698910354126052 | |
4 | -0.0965156342595097 | 0.203200180598669 | 0.843333872842729 | -0.0927285011581401 | 0.201301142840861 | -0.36378506450842 | 0.238164018417497 | 0.92244700110317 | 0.126623284203011 | 0.825533638329062 | |
5 | -0.180835348784396 | 0.599972991222924 | 0.079755498943645 | 0.485460834090224 | -0.419465995079716 | -0.0691208326400593 | -0.42961426285989 | 0.780266259616559 | 0.0905975351035737 | 0.916131173432636 | |
6 | 0.741419465051456 | 0.431342026006021 | -0.0814633949974965 | -0.452698251899937 | -0.0606741267486036 | -0.16172687758833 | -0.151103636140504 | 0.573696298386201 | 0.0489772980330401 | 0.965108471465676 | |
7 | -0.159404889398745 | 0.105272966756617 | 0.260029076069335 | -0.428148411420131 | -0.51533240361692 | 0.644444986031744 | 0.178058065989532 | 0.484222130587456 | 0.0348915285343237 | 1 |
SQL Call
This query uses the PCAScore function to compute the first three score variables, which correspond to the first three principal components, accumulating the response variable from the original input data table patient_pca_input:
CREATE MULTISET TABLE pcr1 AS ( SELECT * FROM PCAScore ( ON pca_scaled AS InputTable PARTITION BY ANY ON pca_health_ev2 AS pca_table DIMENSION ORDER BY component_rank USING Accumulate ('pid', 'strokes') Components (3) ) AS dt ) WITH DATA;
Output
pid | strokes | principal_component_1 | principal_component_2 | principal_component_3 |
---|---|---|---|---|
3 | 0.795410174399195 | 0.195110284571174 | -1.17825238356938 | -1.04290585569861 |
6 | -0.0331420905999665 | -1.02505176519817 | -1.2375999732885 | 0.257874309966306 |
8 | 1.3477783510653 | -1.98240922334734 | 0.159007920294922 | -0.494890984515023 |
11 | -0.30932617893302 | -1.00884248343362 | -0.597930103709463 | 0.301060051568942 |
13 | 1.3477783510653 | 1.091766978011 | -1.16460787259922 | -2.17662153529429 |
16 | 0.519226086066141 | -1.93840477808707 | -0.0623801090526926 | -0.936238178663301 |
19 | -1.13787844393218 | -1.14924574475721 | 1.7169286389402 | 0.588543776698959 |
21 | -0.585510267266074 | 0.144749031593665 | 1.56905010871958 | 0.52484021423205 |
24 | 1.07159426273225 | -0.65879037722031 | 0.120392373582135 | 0.915771821734939 |
2 | -1.13787844393218 | -1.27086507768138 | -0.0479281470754135 | 0.732287038132982 |
5 | -1.41406253226524 | 0.308542873080395 | 2.70938975527651 | -3.04594531776213 |
7 | -0.585510267266074 | -1.61658595260732 | 0.71226267926454 | 0.957505372598163 |
10 | 0.795410174399195 | 0.723400299704323 | -3.73227341951059 | 0.574740342890179 |
15 | -0.0331420905999665 | 1.50953494759911 | -0.643862090689209 | -0.235674114998759 |
18 | 0.519226086066141 | -1.13206482033264 | -0.946451923900274 | 0.153459109295282 |
20 | -1.13787844393218 | -0.447229210945602 | 0.669928830364427 | 0.306483821831369 |
23 | 0.519226086066141 | 0.956038577607332 | -0.649328546941474 | -0.825039826251437 |
1 | 0.243041997733087 | 0.830295227878681 | 0.204892485389732 | -0.205590257682169 |
4 | -1.13787844393218 | -1.57924450867301 | 0.122683543836927 | 1.080304476172 |
9 | -0.861694355599128 | 3.08093421824054 | 0.794495651675079 | 1.2894482577175 |
12 | 1.3477783510653 | 0.0232126330286096 | -0.372098909138089 | -0.867874315202764 |
14 | -1.13787844393218 | 3.77988804114414 | 0.62813244997315 | 0.744457517669959 |
17 | -1.41406253226524 | 0.121200199618048 | 2.08086374773324 | 0.786060284561275 |
22 | 0.795410174399195 | -0.215566553625099 | -0.989679133035365 | -0.169609921563112 |
25 | 1.62396243939836 | 1.25962718383174 | 0.134364427459238 | 0.787553912561683 |
This step has reduced the original data set from 25 observations with 7 predictors to 25 observations with 3 predictors (the 3 score variables principal_component_1, principal_component_2, and principal_component_3).