Input
The following statement creates the input table, pca_health_ev2, from the table pca_scaled (created in PCA Example), using only the variables age, bmi, bloodpressure, glucose, cigarettes, insulin, and hdl:
CREATE MULTISET TABLE pca_health_ev2 AS ( SELECT * FROM PCAReduce ( ON PCAMap ( ON pca_scaled USING TargetColumns ('age','bmi','bloodpressure','glucose','cigarettes', 'insulin','hdl') ) PARTITION BY 1 ) AS dt2 ) WITH DATA;
SELECT * FROM pca_health_ev2 ORDER BY component_rank;
component_rank age bmi bloodpressure glucose cigarettes insulin hdl sd var_proportion cumulative_var mean -------------- -------------------- --------------------- -------------------- -------------------- -------------------- -------------------- -------------------- ------------------ ------------------- ------------------- ----------------------------------------------------------------------------------------------------------------------------------------------------- 1 0.5583834254529891 -0.060595187205856604 0.27919138947629696 0.5628114467260564 0.15648583929470583 0.4970045713164333 0.13538927259606426 1.42680503670853 0.30294235309178996 0.30294235309178996 [-2.3980817331903383E-16, -2.486899575160351E-16, 9.325873406851315E-17, -8.881784197001253E-17, -1.7763568394002505E-17, 0.0, 9.992007221626409E-17] 2 -0.25053749577748563 0.6146863354635507 -0.2583830864504223 -0.04747909985805549 0.5795945478993907 0.30549870012270325 0.24721709525354335 1.2743156091409469 0.241648849955396 0.544591203047186 NULL 3 0.09115635850124769 0.14174869929246842 -0.25192452899733103 0.22016232676910066 -0.3922494663895311 -0.2847363971731954 0.7903960692948381 1.0183441028502045 0.1543191535431231 0.6989103565903091 NULL 4 -0.09651563648638843 0.20320018565784712 0.8433338825440954 -0.0927284926384905 0.201301134307841 -0.36378506814584216 0.238163983819638 0.9224470037180572 0.12662328492089597 0.825533641511205 NULL 5 0.18083536491383206 -0.599972969216183 -0.07975547691432035 -0.4854608493263352 0.4194659786495865 0.06912083003656642 0.42961429013770186 0.7802662535208694 0.09059753368801983 0.9161311751992248 NULL 6 0.7414194738847253 0.43134203241200425 -0.08146342199778292 -0.452698200444976 -0.06067408632930419 -0.16172694090464648 -0.15110366257374142 0.5736962918304901 0.04897729691369861 0.9651084721129235 NULL 7 0.15940482473353615 -0.10527302771359373 -0.2600290708743992 0.4281484440245407 0.5153324165940718 -0.644444967451482 -0.17805804671796582 0.484222126096231 0.0348915278870765 1.0 NULL
SQL Call
This query uses the PCAScore function to compute the first three score variables, which correspond to the first three principal components, accumulating the response variable from the original input data table patient_pca_input:
CREATE MULTISET TABLE pcr1 AS ( SELECT * FROM PCAScore ( ON pca_scaled AS InputTable PARTITION BY ANY ON pca_health_ev2 AS PrCompTable DIMENSION ORDER BY component_rank USING Accumulate ('pid', 'strokes') Components (3) ) AS dt ) WITH DATA;
Output
SELECT * FROM pcr1 ORDER BY pid;
pid strokes principal_component_1 principal_component_2 principal_component_3 --- -------------------- --------------------- --------------------- --------------------- 1 0.24304199773308727 0.8302952240600171 0.20489255944412443 0.20559027239564137 2 -1.1378784439321814 -1.2708650767655945 -0.04792816725261051 -0.7322870352737065 3 0.7954101743991948 0.195110267755587 -1.1782522912070308 1.0429059671659966 4 -1.1378784439321814 -1.579244511221674 0.12268350005146528 -1.0803045155907212 5 -1.4140625322652352 0.30854292481439227 2.709389887899765 3.0459452635078605 6 -0.03314209059996648 -1.0250517880596157 -1.2376000000708138 -0.25787429342810153 7 -0.585510267266074 -1.6165859414652022 0.7122626624930699 -0.9575054086956973 8 1.3477783510653023 -1.9824092216112705 0.1590079604576705 0.4948909543016108 9 -0.8616943555991278 3.0809342286033687 0.7944955261525494 -1.2894483164958952 10 0.7954101743991948 0.7234002078356538 -3.732273405444996 -0.5747403021339348 11 -0.3093261789330202 -1.008842496277165 -0.5979300211866893 -0.3010599902582435 12 1.3477783510653023 0.023212637125213292 -0.37209892218773394 0.8678743510477422 13 1.3477783510653023 1.0917669520792725 -1.164607897382604 2.1766214799585746 14 -1.1378784439321814 3.7798880566862363 0.6281322811758272 -0.7444575867324298 15 -0.03314209059996648 1.5095349283904942 -0.6438620762403766 0.2356741216731319 16 0.5192260860661411 -1.938404783284025 -0.06238010612979375 0.9362381037343539 17 -1.4140625322652352 0.12120024801596792 2.0808637455284806 -0.7860602762155817 18 0.5192260860661411 -1.132064831951979 -0.9464519509600725 -0.15345908277200465 19 -1.1378784439321814 -1.14924571078705 1.716928640681718 -0.5885438218828254 20 -1.1378784439321814 -0.4472292099018682 0.669928904550861 -0.3064838365512166 21 -0.585510267266074 0.1447490581511024 1.56905011434132 -0.524840241945967 22 0.7954101743991948 -0.21556654992946048 -0.9896792588736445 0.1696099417861806 23 0.5192260860661411 0.9560385810129385 -0.6493285131865845 0.8250399361604549 24 1.0715942627322486 -0.6587903759681892 0.1203923639078256 -0.9157718072510063 25 1.623962439398356 1.2596271826928498 0.1343644634382721 -0.7875538765042159
This step has reduced the original data set from 25 observations with 7 predictors to 25 observations with 3 predictors (the 3 score variables principal_component_1, principal_component_2, and principal_component_3).