1.0 - 8.00 - Step 2: Show How PCR Works - Teradata Vantage

Teradata® Vantage Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.0
8.00
Release Date
May 2019
Content Type
Programming Reference
Publication ID
B700-4003-098K
Language
English (United States)

This step uses the reduced data set, pcr1, to show how PCR works. The example regresses the response variable on the 3 score variables identified by PCA. (The number 3 is arbitrary. With a different data set, PCA might identify a different number of score variables. PCA identifies the number of score variables based on the percentage of total variance that they capture.)

For PCR, you must decide before PCA whether to normalize the input data set. For the data set that this example uses, it was appropriate.

Input

The following statement creates the input table for the linear regression function. The linear regression function requires the response variable to be in the final column; therefore, strokes must be last in the SELECT list.

CREATE MULTISET TABLE pcr2 AS (
  SELECT principal_component_1, principal_component_2,
    principal_component_3, strokes 
  FROM pcr1
) WITH DATA;
pcr2
principal_component_1 principal_component_2 principal_component_3 strokes
-1.27086507768138 -0.0479281470754137 0.732287038132982 -1.13787844393218
0.308542873080396 2.70938975527651 -3.04594531776213 -1.41406253226524
-1.61658595260731 0.712262679264539 0.957505372598163 -0.585510267266074
0.723400299704324 -3.73227341951058 0.574740342890179 0.795410174399195
1.50953494759911 -0.643862090689208 -0.235674114998759 -0.0331420905999665
-1.13206482033264 -0.946451923900274 0.153459109295281 0.519226086066141
-0.447229210945602 0.669928830364427 0.306483821831369 -1.13787844393218
0.956038577607332 -0.649328546941474 -0.825039826251437 0.519226086066141
0.195110284571175 -1.17825238356938 -1.04290585569861 0.795410174399195
-1.02505176519817 -1.2375999732885 0.257874309966305 -0.0331420905999665
-1.98240922334734 0.15900792029492 -0.494890984515027 1.3477783510653
-1.00884248343362 -0.597930103709463 0.301060051568943 -0.30932617893302
1.091766978011 -1.16460787259922 -2.17662153529429 1.3477783510653
-1.93840477808707 -0.0623801090526942 -0.936238178663304 0.519226086066141
-1.14924574475721 1.7169286389402 0.588543776698958 -1.13787844393218
0.144749031593665 1.56905010871957 0.524840214232052 -0.585510267266074
-0.65879037722031 0.120392373582135 0.91577182173494 1.07159426273225
0.830295227878681 0.204892485389732 -0.205590257682167 0.243041997733087
-1.57924450867301 0.122683543836927 1.080304476172 -1.13787844393218
3.08093421824054 0.79449565167508 1.2894482577175 -0.861694355599128
0.0232126330286094 -0.37209890913809 -0.867874315202764 1.3477783510653
3.77988804114414 0.628132449973151 0.744457517669957 -1.13787844393218
0.121200199618047 2.08086374773324 0.786060284561277 -1.41406253226524
-0.2155665536251 -0.989679133035365 -0.169609921563112 0.795410174399195
1.25962718383174 0.134364427459239 0.787553912561685 1.62396243939836

SQL Call

SELECT * FROM LinReg (
 ON LinRegInternal (
   ON pcr2
   ) AS dt1 PARTITION BY 1
 ) AS dt2 ORDER BY 1;

Output

coefficient_name value
Intercept 9.14880298701766e-18
principal_component_1 -0.0297188652090044
principal_component_2 -0.469895976698121
principal_component_3 -0.258717923640999