1.0 - 8.00 - LAR Example 1: FitMethod ('lar') - Teradata Vantage

Teradata® Vantage Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.0
8.00
Release Date
May 2019
Content Type
Programming Reference
Publication ID
B700-4003-098K
Language
English (United States)

Input

This input is diabetes data from "Least Angle Regression," by Bradley Efron and others.

The InputTable, diabetes, has one response (vector y) and ten baseline predictors measured on 442 diabetes patients. The baseline predictors are age, sex, body mass index (bmi), mean arterial pressure (map) and six blood serum measurements (tc, ldl, hdl, tch, ltg, glu).

The column id is the row identifier, y is the response, and the other columns are predictors.

This data set is atypical in that each predictor has mean 0 and norm 1, which means the following:
  • The value of the Normalize argument is irrelevant.
  • If the value of the Intercept argument is 'true', then the intercept is considered to be constant along the entire path (which is typically not true).
InputTable: diabetes
id age sex bmi map tc ldl hdl tch ltg glu y
1 0.0380759 0.0506801 0.0616962 0.0218724 -0.0442235 -0.0348208 -0.0434008 -0.00259226 0.0199084 -0.0176461 151
2 -0.00188202 -0.0446416 -0.0514741 -0.0263278 -0.00844872 -0.0191633 0.0744116 -0.0394934 -0.0683297 -0.092204 75
3 0.0852989 0.0506801 0.0444512 -0.00567061 -0.0455994 -0.0341945 -0.0323559 -0.00259226 0.00286377 -0.0259303 141
4 -0.0890629 -0.0446416 -0.011595 -0.0366564 0.0121906 0.0249906 -0.0360376 0.0343089 0.022692 -0.00936191 206
5 0.00538306 -0.0446416 -0.0363847 0.0218724 0.00393485 0.0155961 0.00814208 -0.00259226 -0.0319914 -0.0466409 135
... ... ... ... ... ... ... ... ... ... ... ...

SQL Call

SELECT * FROM LAR (
  ON diabetes AS InputTable
  OUT TABLE OutputTable (diabetes_lars)
  USING
  TargetColumns ('y', 'age', 'sex', 'bmi', 'map', 'tc', 'ldl', 'hdl',
                'tch', 'ltg', 'glu')
  FitMethod ('lar')
  Intercept ('true')
  L2Normalization ('true')
  MaxIterNum (20)
) AS dt;

Output

message
Successful.
Result has been stored in the table specified in the argument OutputTable.

This query returns the following table:

SELECT * FROM diabetes_lars WHERE steps <> 0 ORDER BY steps;
diabetes_lars
steps var_id var_name max_abs_corr step_length intercept age sex bmi map tc ldl hdl tch ltg glu
1 3 bmi 949.435 60.1193 152.133 0 0 60.1193 0 0 0 0 0 0 0
2 9 ltg 889.316 513.224 152.133 0 0 361.895 0 0 0 0 0 301.775 0
3 4 map 452.901 175.553 152.133 0 0 434.758 79.2364 0 0 0 0 374.916 0
4 7 hdl 316.074 259.367 152.133 0 0 505.66 191.27 0 0 -114.101 0 439.665 0
5 2 sex 130.131 88.6592 152.133 0 -74.9165 511.348 234.155 0 0 -169.711 0 450.667 0
6 10 glu 88.7824 43.6779 152.133 0 -111.979 512.044 252.527 0 0 -196.045 0 452.393 12.0781
7 5 tc 68.9652 135.984 152.133 0 -197.757 522.265 297.16 -103.946 0 -223.926 0 514.75 54.7677
8 8 tch 19.9813 54.0156 152.133 0 -226.134 526.885 314.389 -195.106 0 -152.477 106.343 529.916 64.4874
9 6 ldl 5.47747 5.56726 152.133 0 -227.176 526.391 314.95 -237.341 33.6284 -134.599 111.384 545.483 64.6067
10 1 age 5.08918 73.5291 152.133 -10.0122 -239.819 519.84 324.39 -792.184 476.746 101.045 177.064 751.279 67.6254

The following figure represents the results and shows how the standardized coefficients evolved during the model-building process. The x-axis represents the ratio of the norm of the current beta to the full beta. The y-axis represents the standardized coefficients, which are estimated when standardized predictors are used. The numbers on the top of the graph represent the steps of the model-building process. The numbers on the right represent the predictor IDs.