1.1 - 8.10 - Step 1: Create Smaller Data Set - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.1
8.10
Release Date
October 2019
Content Type
Programming Reference
Publication ID
B700-4003-079K
Language
English (United States)

Input

The following statement creates the input table, pca_health_ev2, from the table pca_scaled (created in PCA Example), using only the variables age, bmi, bloodpressure, glucose, cigarettes, insulin, and hdl:

CREATE MULTISET TABLE pca_health_ev2 AS (
  SELECT * FROM PCAReduce (
    ON PCAMap (
      ON pca_scaled
      USING
      TargetColumns ('age','bmi','bloodpressure','glucose','cigarettes', 'insulin','hdl')
    ) PARTITION BY 1
  ) AS dt2
) WITH DATA;
SELECT * FROM pca_health_ev2 ORDER BY component_rank;
 component_rank age                  bmi                   bloodpressure        glucose              cigarettes           insulin              hdl                  sd                 var_proportion      cumulative_var      mean                                                                                                                                                  
 -------------- -------------------- --------------------- -------------------- -------------------- -------------------- -------------------- -------------------- ------------------ ------------------- ------------------- ----------------------------------------------------------------------------------------------------------------------------------------------------- 
              1   0.5583834254529891 -0.060595187205856604  0.27919138947629696   0.5628114467260564  0.15648583929470583   0.4970045713164333  0.13538927259606426   1.42680503670853 0.30294235309178996 0.30294235309178996 [-2.3980817331903383E-16, -2.486899575160351E-16, 9.325873406851315E-17, -8.881784197001253E-17, -1.7763568394002505E-17, 0.0, 9.992007221626409E-17]
              2 -0.25053749577748563    0.6146863354635507  -0.2583830864504223 -0.04747909985805549   0.5795945478993907  0.30549870012270325  0.24721709525354335 1.2743156091409469   0.241648849955396   0.544591203047186 NULL                                                                                                                                                 
              3  0.09115635850124769   0.14174869929246842 -0.25192452899733103  0.22016232676910066  -0.3922494663895311  -0.2847363971731954   0.7903960692948381 1.0183441028502045  0.1543191535431231  0.6989103565903091 NULL                                                                                                                                                 
              4 -0.09651563648638843   0.20320018565784712   0.8433338825440954  -0.0927284926384905    0.201301134307841 -0.36378506814584216    0.238163983819638 0.9224470037180572 0.12662328492089597   0.825533641511205 NULL                                                                                                                                                 
              5  0.18083536491383206    -0.599972969216183 -0.07975547691432035  -0.4854608493263352   0.4194659786495865  0.06912083003656642  0.42961429013770186 0.7802662535208694 0.09059753368801983  0.9161311751992248 NULL                                                                                                                                                 
              6   0.7414194738847253   0.43134203241200425 -0.08146342199778292   -0.452698200444976 -0.06067408632930419 -0.16172694090464648 -0.15110366257374142 0.5736962918304901 0.04897729691369861  0.9651084721129235 NULL                                                                                                                                                 
              7  0.15940482473353615  -0.10527302771359373  -0.2600290708743992   0.4281484440245407   0.5153324165940718   -0.644444967451482 -0.17805804671796582  0.484222126096231  0.0348915278870765                 1.0 NULL                                                                                        

SQL Call

This query uses the PCAScore function to compute the first three score variables, which correspond to the first three principal components, accumulating the response variable from the original input data table patient_pca_input:

CREATE MULTISET TABLE pcr1 AS (
  SELECT * FROM PCAScore (
    ON pca_scaled AS InputTable PARTITION BY ANY
    ON pca_health_ev2 AS PrCompTable DIMENSION ORDER BY component_rank
    USING
    Accumulate ('pid', 'strokes')
    Components (3)
  ) AS dt
) WITH DATA;

Output

SELECT * FROM pcr1 ORDER BY pid;
 pid strokes              principal_component_1 principal_component_2 principal_component_3 
 --- -------------------- --------------------- --------------------- --------------------- 
   1  0.24304199773308727    0.8302952240600171   0.20489255944412443   0.20559027239564137
   2  -1.1378784439321814   -1.2708650767655945  -0.04792816725261051   -0.7322870352737065
   3   0.7954101743991948     0.195110267755587   -1.1782522912070308    1.0429059671659966
   4  -1.1378784439321814    -1.579244511221674   0.12268350005146528   -1.0803045155907212
   5  -1.4140625322652352   0.30854292481439227     2.709389887899765    3.0459452635078605
   6 -0.03314209059996648   -1.0250517880596157   -1.2376000000708138  -0.25787429342810153
   7   -0.585510267266074   -1.6165859414652022    0.7122626624930699   -0.9575054086956973
   8   1.3477783510653023   -1.9824092216112705    0.1590079604576705    0.4948909543016108
   9  -0.8616943555991278    3.0809342286033687    0.7944955261525494   -1.2894483164958952
  10   0.7954101743991948    0.7234002078356538    -3.732273405444996   -0.5747403021339348
  11  -0.3093261789330202    -1.008842496277165   -0.5979300211866893   -0.3010599902582435
  12   1.3477783510653023  0.023212637125213292  -0.37209892218773394    0.8678743510477422
  13   1.3477783510653023    1.0917669520792725    -1.164607897382604    2.1766214799585746
  14  -1.1378784439321814    3.7798880566862363    0.6281322811758272   -0.7444575867324298
  15 -0.03314209059996648    1.5095349283904942   -0.6438620762403766    0.2356741216731319
  16   0.5192260860661411    -1.938404783284025  -0.06238010612979375    0.9362381037343539
  17  -1.4140625322652352   0.12120024801596792    2.0808637455284806   -0.7860602762155817
  18   0.5192260860661411    -1.132064831951979   -0.9464519509600725  -0.15345908277200465
  19  -1.1378784439321814     -1.14924571078705     1.716928640681718   -0.5885438218828254
  20  -1.1378784439321814   -0.4472292099018682     0.669928904550861   -0.3064838365512166
  21   -0.585510267266074    0.1447490581511024      1.56905011434132    -0.524840241945967
  22   0.7954101743991948  -0.21556654992946048   -0.9896792588736445    0.1696099417861806
  23   0.5192260860661411    0.9560385810129385   -0.6493285131865845    0.8250399361604549
  24   1.0715942627322486   -0.6587903759681892    0.1203923639078256   -0.9157718072510063
  25    1.623962439398356    1.2596271826928498    0.1343644634382721   -0.7875538765042159

This step has reduced the original data set from 25 observations with 7 predictors to 25 observations with 3 predictors (the 3 score variables principal_component_1, principal_component_2, and principal_component_3).