This step uses the reduced data set, pcr1, to show how PCR works. The example regresses the response variable on the three score variables identified by PCA. (The number three is arbitrary. With a different data set, PCA might identify a different number of score variables. PCA identifies the number of score variables based on the percentage of total variance that they capture.)
For PCR, you must decide before PCA whether to normalize the input data set. For the data set that this example uses, it was appropriate.
Input
The following statement creates the input table for the linear regression function. The linear regression function requires the response variable to be in the final column; therefore, strokes must be last in the SELECT list.
CREATE MULTISET TABLE pcr2 AS ( SELECT principal_component_1, principal_component_2, principal_component_3, strokes FROM pcr1 ) WITH DATA;
SELECT * FROM pcr2;
principal_component_1 principal_component_2 principal_component_3 strokes --------------------- --------------------- --------------------- -------------------- 0.21556654992945887 0.9896792588736425 0.16960994178618327 0.7954101743991948 -1.25962718269285 -0.134364463438275 -0.787553876504215 1.623962439398356 -0.7234002078356577 3.732273405445003 -0.5747403021339288 0.7954101743991948 1.0088424962771645 0.5979300211866878 -0.3010599902582418 -0.3093261789330202 1.0250517880596144 1.2376000000708163 -0.2578742934281 -0.03314209059996648 1.132064831951978 0.9464519509600738 -0.15345908277200326 0.5192260860661411 -1.5095349283904946 0.6438620762403761 0.23567412167313243 -0.03314209059996648 1.6165859414652024 -0.7122626624930654 -0.957505408695698 -0.585510267266074 1.9384047832840248 0.06238010612979778 0.9362381037343512 0.5192260860661411 -3.7798880566862354 -0.6281322811758286 -0.7444575867324343 -1.1378784439321814 -0.14474905815110067 -1.5690501143413216 -0.5248402419459675 -0.585510267266074 0.6587903759681895 -0.12039236390782342 -0.9157718072510054 1.0715942627322486 -0.3085429248143898 -2.70938988789977 3.045945263507859 -1.4140625322652352 -0.12120024801596593 -2.0808637455284833 -0.7860602762155829 -1.4140625322652352 -3.080934228603367 -0.7944955261525505 -1.2894483164958985 -0.8616943555991278 1.9824092216112708 -0.1590079604576678 0.49489095430160657 1.3477783510653023 -0.8302952240600173 -0.20489255944412654 0.2055902723956421 0.24304199773308727 -1.091766952079274 1.1646078973826008 2.1766214799585786 1.3477783510653023 1.579244511221674 -0.1226835000514607 -1.0803045155907212 -1.1378784439321814 0.4472292099018688 -0.6699289045508605 -0.3064838365512166 -1.1378784439321814 1.149245710787051 -1.7169286406817175 -0.5885438218828289 -1.1378784439321814 -0.956038581012938 0.6493285131865779 0.8250399361604551 0.5192260860661411 1.2708650767655933 0.047928167252614645 -0.7322870352737051 -1.1378784439321814 -0.023212637125212883 0.37209892218772994 0.867874351047742 1.3477783510653023 -0.19511026775558732 1.178252291207031 1.0429059671659968 0.7954101743991948
SQL Call
SELECT * FROM LinReg ( ON LinRegInternal ( ON pcr2 ) AS dt1 PARTITION BY 1 ) AS dt2 ORDER BY 1;
Output
coefficient_name value_col --------------------- ----------------------- Intercept -2.2029073310707358E-17 principal_component_1 0.02971887194081586 principal_component_2 0.4698959686929861 principal_component_3 0.25871794248366853