SQL to Create Table of Principal Components - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product
Aster Analytics
Release Number
6.21
Published
November 2016
Language
English (United States)
Last Update
2018-04-14
dita:mapPath
kiu1466024880662.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1021
lifecycle
previous
Product Category
Software

This SQL statement creates a table that contains the principal components that correspond to the three top-ranked eigenvectors—that is, the principal components that explain ~98% of the total variance. The statement calculates these components by linearly combining the input attributes for each patient (from Input) with the three top-ranked eigenvectors (shown in the table pca_health_ev).

This method of generating the principal components uses uncentered input, whereas the function PCAPlot_stub uses centered input that it computes internally. The two methods generate principal components that are different but equivalent. PCAPlot, which is more compact, is recommended when the data has many dimensions and requires many principal components.
CREATE TABLE pca_health_pc DISTRIBUTE BY REPLICATION AS
  SELECT pid, pca_1, pca_2, pca_3
    FROM
    (SELECT pid,
      (a.age * b.age) +
      (a.bmi * b.bmi) +
      (a.bloodpressure * b.bloodpressure) +
      (a.glucose * b.glucose) +
      (a.strokes * b.strokes) +
      (a.cigarettes * b.cigarettes) +
      (a.insulin * b.insulin) +
      (a.hdl * b.hdl) as pca_1
      FROM pca_health_ev a,
           patient_pca_input b
      WHERE a.component_rank = 1) a
  JOIN
  (SELECT pid,
    (a.age * b.age) +
    (a.bmi * b.bmi) +
    (a.bloodpressure * b.bloodpressure) +
    (a.glucose * b.glucose) +
    (a.strokes * b.strokes) +
    (a.cigarettes * b.cigarettes) +
    (a.insulin * b.insulin) +
    (a.hdl * b.hdl) as pca_2
    FROM pca_health_ev a,
         patient_pca_input b
    WHERE a.component_rank = 2) b
    USING(pid)
  JOIN
  (SELECT pid,
    (a.age * b.age) +
    (a.bmi * b.bmi) +
    (a.bloodpressure * b.bloodpressure) +
    (a.glucose * b.glucose) +
    (a.strokes * b.strokes) +
    (a.cigarettes * b.cigarettes) +
    (a.insulin * b.insulin) +
    (a.hdl * b.hdl) as pca_3
    FROM pca_health_ev a,
         patient_pca_input b
    WHERE a.component_rank = 3) c
    USING(pid) order by pid;