Step 2: Show How PCR Works - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.10
1.1
Published
October 2019
Language
English (United States)
Last Update
2019-12-31
dita:mapPath
ima1540829771750.ditamap
dita:ditavalPath
jsj1481748799576.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

This step uses the reduced data set, pcr1, to show how PCR works. The example regresses the response variable on the three score variables identified by PCA. (The number three is arbitrary. With a different data set, PCA might identify a different number of score variables. PCA identifies the number of score variables based on the percentage of total variance that they capture.)

For PCR, you must decide before PCA whether to normalize the input data set. For the data set that this example uses, it was appropriate.

Input

The following statement creates the input table for the linear regression function. The linear regression function requires the response variable to be in the final column; therefore, strokes must be last in the SELECT list.

CREATE MULTISET TABLE pcr2 AS (
  SELECT principal_component_1, principal_component_2,
    principal_component_3, strokes 
  FROM pcr1
) WITH DATA;
SELECT * FROM pcr2;
 principal_component_1 principal_component_2 principal_component_3 strokes              
 --------------------- --------------------- --------------------- -------------------- 
   0.21556654992945887    0.9896792588736425   0.16960994178618327   0.7954101743991948
     -1.25962718269285    -0.134364463438275    -0.787553876504215    1.623962439398356
   -0.7234002078356577     3.732273405445003   -0.5747403021339288   0.7954101743991948
    1.0088424962771645    0.5979300211866878   -0.3010599902582418  -0.3093261789330202
    1.0250517880596144    1.2376000000708163      -0.2578742934281 -0.03314209059996648
     1.132064831951978    0.9464519509600738  -0.15345908277200326   0.5192260860661411
   -1.5095349283904946    0.6438620762403761   0.23567412167313243 -0.03314209059996648
    1.6165859414652024   -0.7122626624930654    -0.957505408695698   -0.585510267266074
    1.9384047832840248   0.06238010612979778    0.9362381037343512   0.5192260860661411
   -3.7798880566862354   -0.6281322811758286   -0.7444575867324343  -1.1378784439321814
  -0.14474905815110067   -1.5690501143413216   -0.5248402419459675   -0.585510267266074
    0.6587903759681895  -0.12039236390782342   -0.9157718072510054   1.0715942627322486
   -0.3085429248143898     -2.70938988789977     3.045945263507859  -1.4140625322652352
  -0.12120024801596593   -2.0808637455284833   -0.7860602762155829  -1.4140625322652352
    -3.080934228603367   -0.7944955261525505   -1.2894483164958985  -0.8616943555991278
    1.9824092216112708   -0.1590079604576678   0.49489095430160657   1.3477783510653023
   -0.8302952240600173  -0.20489255944412654    0.2055902723956421  0.24304199773308727
    -1.091766952079274    1.1646078973826008    2.1766214799585786   1.3477783510653023
     1.579244511221674   -0.1226835000514607   -1.0803045155907212  -1.1378784439321814
    0.4472292099018688   -0.6699289045508605   -0.3064838365512166  -1.1378784439321814
     1.149245710787051   -1.7169286406817175   -0.5885438218828289  -1.1378784439321814
    -0.956038581012938    0.6493285131865779    0.8250399361604551   0.5192260860661411
    1.2708650767655933  0.047928167252614645   -0.7322870352737051  -1.1378784439321814
 -0.023212637125212883   0.37209892218772994     0.867874351047742   1.3477783510653023
  -0.19511026775558732     1.178252291207031    1.0429059671659968   0.7954101743991948

SQL Call

SELECT * FROM LinReg (
 ON LinRegInternal (
   ON pcr2
   ) AS dt1 PARTITION BY 1
 ) AS dt2 ORDER BY 1;

Output

 coefficient_name      value_col               
 --------------------- ----------------------- 
 Intercept             -2.2029073310707358E-17
 principal_component_1     0.02971887194081586
 principal_component_2      0.4698959686929861
 principal_component_3     0.25871794248366853