LAR Example: FitMethod ('lar') - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.10
1.1
Published
October 2019
Language
English (United States)
Last Update
2019-12-31
dita:mapPath
ima1540829771750.ditamap
dita:ditavalPath
jsj1481748799576.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

Input

This input is diabetes data from "Least Angle Regression," by Bradley Efron and others.

The InputTable, diabetes, has one response (vector y) and ten baseline predictors measured on 442 diabetes patients. The baseline predictors are age, sex, body mass index (bmi), mean arterial pressure (map) and six blood serum measurements (tc, ldl, hdl, tch, ltg, glu).

The column id is the row identifier, y is the response, and the other columns are predictors.

This data set is atypical in that each predictor has mean 0 and norm 1, which means the following:
  • The value of the Normalize syntax element is irrelevant.
  • If the value of the Intercept syntax element is 'true', then the intercept is considered to be constant along the entire path (which is typically not true).
InputTable: diabetes
id age sex bmi map tc ldl hdl tch ltg glu y
1 0.0380759 0.0506801 0.0616962 0.0218724 -0.0442235 -0.0348208 -0.0434008 -0.00259226 0.0199084 -0.0176461 151
2 -0.00188202 -0.0446416 -0.0514741 -0.0263278 -0.00844872 -0.0191633 0.0744116 -0.0394934 -0.0683297 -0.092204 75
3 0.0852989 0.0506801 0.0444512 -0.00567061 -0.0455994 -0.0341945 -0.0323559 -0.00259226 0.00286377 -0.0259303 141
4 -0.0890629 -0.0446416 -0.011595 -0.0366564 0.0121906 0.0249906 -0.0360376 0.0343089 0.022692 -0.00936191 206
5 0.00538306 -0.0446416 -0.0363847 0.0218724 0.00393485 0.0155961 0.00814208 -0.00259226 -0.0319914 -0.0466409 135
... ... ... ... ... ... ... ... ... ... ... ...

SQL Call

SELECT * FROM LAR (
  ON diabetes AS InputTable
  OUT TABLE OutputTable (diabetes_lars)
  USING
  TargetColumns ('y', 'age', 'sex', 'bmi', 'map', 'tc', 'ldl', 'hdl',
                'tch', 'ltg', 'glu')
  FitMethod ('lar')
  Intercept ('true')
  L2Normalization ('true')
  MaxIterNum (20)
) AS dt;

Output

 message                                                                    
 -------------------------------------------------------------------------- 
 Successful.                                                               
 Result has been stored in the table specified in the argument OutputTable.
SELECT * FROM diabetes_lars WHERE steps <> 0 ORDER BY steps;
 steps var_id var_name max_abs_corr       step_length        intercept          age                 sex                 bmi                map1               tc                  ldl                hdl                 tch                ltg                glu                
 ----- ------ -------- ------------------ ------------------ ------------------ ------------------- ------------------- ------------------ ------------------ ------------------- ------------------ ------------------- ------------------ ------------------ ------------------ 
     1      3 bmi       949.4352416992188  60.11927032470703 152.13348388671875                 0.0                 0.0  60.11927032470703                0.0                 0.0                0.0                 0.0                0.0                0.0                0.0
     2      9 ltg       889.3159790039062  513.2236938476562 152.13348388671875                 0.0                 0.0  361.8946228027344                0.0                 0.0                0.0                 0.0                0.0 301.77532958984375                0.0
     3      4 map1      452.9009704589844    175.55322265625 152.13348388671875                 0.0                 0.0  434.7579650878906   79.2364501953125                 0.0                0.0                 0.0                0.0 374.91583251953125                0.0
     4      7 hdl       316.0740661621094  259.3674621582031 152.13348388671875                 0.0                 0.0  505.6595458984375 191.26988220214844                 0.0                0.0 -114.10098266601562                0.0  439.6649475097656                0.0
     5      2 sex      130.13084411621094   88.6591567993164 152.13348388671875                 0.0  -74.91651153564453 511.34808349609375  234.1546173095703                 0.0                0.0 -169.71139526367188                0.0  450.6674499511719                0.0
     6     10 glu       88.78243255615234  43.67793273925781 152.13348388671875                 0.0 -111.97855377197266  512.0440673828125  252.5270233154297                 0.0                0.0 -196.04544067382812                0.0  452.3927307128906 12.078152656555176
     7      5 tc        68.96521759033203  135.9840850830078 152.13348388671875                 0.0 -197.75650024414062  522.2648315429688 297.15972900390625 -103.94625091552734                0.0 -223.92604064941406                0.0  514.7494506835938  54.76768112182617
     8      8 tch       19.98125457763672 54.015602111816406 152.13348388671875                 0.0  -226.1336669921875  526.8854370117188  314.3892822265625  -195.1058349609375                0.0 -152.47726440429688 106.34280395507812      529.916015625  64.48741912841797
     9      6 ldl        5.47747278213501   5.56723165512085 152.13348388671875                 0.0 -227.17579650878906  526.3905639648438  314.9504699707031 -237.34097290039062 33.628273010253906 -134.59934997558594  111.3841323852539  545.4826049804688   64.6066665649414
    10      1 age       5.089179039001465   73.5290756225586 152.13348388671875 -10.012197494506836   -239.819091796875  519.8397827148438 324.39044189453125  -792.1841430664062   476.745849609375  101.04457092285156 177.06417846679688      751.279296875   67.6253890991211

The following figure represents the results and shows how the standardized coefficients evolved during the model-building process. The x-axis represents the ratio of the norm of the current beta to the full beta. The y-axis represents the standardized coefficients, which are estimated when standardized predictors are used. The numbers on the top of the graph represent the steps of the model-building process. The numbers on the right represent the predictor IDs.



Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.