This step uses the reduced data set, pcr1, to show how PCR works. The example regresses the response variable on the 3 score variables identified by PCA. (The number 3 is arbitrary. With a different data set, PCA might identify a different number of score variables. PCA identifies the number of score variables based on the percentage of total variance that they capture.)
For PCR, you must decide before PCA whether to normalize the input data set. For the data set that this example uses, it was appropriate.
Input
The following statement creates the input table for the linear regression function. The linear regression function requires the response variable to be in the final column; therefore, strokes must be last in the SELECT list.
CREATE MULTISET TABLE pcr2 AS ( SELECT principal_component_1, principal_component_2, principal_component_3, strokes FROM pcr1 ) WITH DATA;
principal_component_1 | principal_component_2 | principal_component_3 | strokes |
---|---|---|---|
-1.27086507768138 | -0.0479281470754137 | 0.732287038132982 | -1.13787844393218 |
0.308542873080396 | 2.70938975527651 | -3.04594531776213 | -1.41406253226524 |
-1.61658595260731 | 0.712262679264539 | 0.957505372598163 | -0.585510267266074 |
0.723400299704324 | -3.73227341951058 | 0.574740342890179 | 0.795410174399195 |
1.50953494759911 | -0.643862090689208 | -0.235674114998759 | -0.0331420905999665 |
-1.13206482033264 | -0.946451923900274 | 0.153459109295281 | 0.519226086066141 |
-0.447229210945602 | 0.669928830364427 | 0.306483821831369 | -1.13787844393218 |
0.956038577607332 | -0.649328546941474 | -0.825039826251437 | 0.519226086066141 |
0.195110284571175 | -1.17825238356938 | -1.04290585569861 | 0.795410174399195 |
-1.02505176519817 | -1.2375999732885 | 0.257874309966305 | -0.0331420905999665 |
-1.98240922334734 | 0.15900792029492 | -0.494890984515027 | 1.3477783510653 |
-1.00884248343362 | -0.597930103709463 | 0.301060051568943 | -0.30932617893302 |
1.091766978011 | -1.16460787259922 | -2.17662153529429 | 1.3477783510653 |
-1.93840477808707 | -0.0623801090526942 | -0.936238178663304 | 0.519226086066141 |
-1.14924574475721 | 1.7169286389402 | 0.588543776698958 | -1.13787844393218 |
0.144749031593665 | 1.56905010871957 | 0.524840214232052 | -0.585510267266074 |
-0.65879037722031 | 0.120392373582135 | 0.91577182173494 | 1.07159426273225 |
0.830295227878681 | 0.204892485389732 | -0.205590257682167 | 0.243041997733087 |
-1.57924450867301 | 0.122683543836927 | 1.080304476172 | -1.13787844393218 |
3.08093421824054 | 0.79449565167508 | 1.2894482577175 | -0.861694355599128 |
0.0232126330286094 | -0.37209890913809 | -0.867874315202764 | 1.3477783510653 |
3.77988804114414 | 0.628132449973151 | 0.744457517669957 | -1.13787844393218 |
0.121200199618047 | 2.08086374773324 | 0.786060284561277 | -1.41406253226524 |
-0.2155665536251 | -0.989679133035365 | -0.169609921563112 | 0.795410174399195 |
1.25962718383174 | 0.134364427459239 | 0.787553912561685 | 1.62396243939836 |
SQL Call
SELECT * FROM LinReg ( ON LinRegInternal ( ON pcr2 ) AS dt1 PARTITION BY 1 ) AS dt2 ORDER BY 1;
Output
coefficient_name | value |
---|---|
Intercept | 9.14880298701766e-18 |
principal_component_1 | -0.0297188652090044 |
principal_component_2 | -0.469895976698121 |
principal_component_3 | -0.258717923640999 |