TD_GLM Examples | GLM | Teradata Vantage - Examples: How to Use TD_GLM - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-04-06
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantage™

Input Table for train_dataset

encoded ROW_I attribute_1 attribute_2 attribute_3 …. attribute_49 sample_id
0 99 -0.0664 -0.0999 -0.0949 …. -0.0942 2
0 101 -0.0603 -0.0938 -0.0900 …. -0.0935 2
1 114 0.0000 0.0001 0.0001 …. 0.0001 2
1 115 0.0001 0.0001 0.0001 …. 0.0001 2
…. …. …. …. …. …. …. ….

Example: TD_GLM Using Credit Data Set

The following credit data set is used in this example:

ID A1 A2 A7 A10 A13 A14 A0_b A0_a A3_Y A3_u A4_p A4_g A5_k A5_cc A5_d A5_c A5_aa A5_m A5_q A5_w A5_e A5_ff A5_j A5_x A5_i A6_v A6_h A6_bb A6_z A6_ff A6_j A8_t A8_f A9_t A9_f A11_t A11_f A12_g A12_s Outcome
61 0.218228 2.17724 0.142986 1.25309 0.238997 -0.130455 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 1 0 1
297 -0.453325 -0.0247979 -0.404925 -0.77231 0.886307 -0.225685 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0
631 -1.13175 -0.77234 -0.199458 -0.547266 -0.761392-0.643699 -0.225817 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 1 0 0
122 -0.657541 1.20223 0.00600835 0.802998 -0.54366 -0.120063 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 1 0 1
... ... ... ... ... ... ... ... ... ... ... .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

TD_GLM Call for Credit Data

CREATE VOLATILE TABLE td_glm_output_credit_ex AS (
    SELECT * FROM td_glm (
        ON credit_ex_merged AS InputTable
        USING
        InputColumns('a1', 'a2', 'a7', 'a10', 'a13', 'a14', 'a0_b', 'a0_a', 'a3_y', 'a3_u', 'a4_p', 'a4_g', 'a5_k', 'a5_cc', 'a5_d', 'a5_c', 'a5_aa', 'a5_m', 'a5_q', 'a5_w', 'a5_e', 'a5_ff', 'a5_j', 'a5_x', 'a5_i', 'a6_v', 'a6_h', 'a6_bb', 'a6_z', 'a6_ff', 'a6_j', 'a8_t', 'a8_f', 'a9_t', 'a9_f', 'a11_t', 'a11_f', 'a12_g', 'a12_s')
        ResponseColumn('Outcome')
        Family('Binomial')
        BatchSize(10)
        MaxIterNum(300)
        RegularizationLambda(0.02)
        Alpha(0.15)
        IterNumNoChange(50)
        Tolerance(0.001)
        Intercept('true')
        LearningRate('optimal')
        InitialEta(0.001)
        Momentum(0.0)
        LocalSGDIterations(0)
    ) AS dt
) WITH DATA
ON COMMIT PRESERVE ROWS
;

TD_GLM Output for Credit Data

Attribute Predictor Estimate Value
-13 LocaleSGD Iterations 0  
-12 Nesterov   FALSE
-11 Momentum 0  
-10 Learning Rate (Final) 0.287682  
-9 Learning Rate (Initial) 0.001  
-8 Number of Iterations 156 CONVERGED
-7 Alpha 0.15 Elasticnet
-6 Regularization 0.02 ENABLED
-5 BIC 151.787  
-4 AIC 80.4189  
-3 Number of Observations 44  
-2 Loglik -0.209461  
-1 Loss Function   LOG
0 (Intercept) 0.146566  
1 A1 0.732289  
2 A2 0  
3 A7 0.717899  
4 A10 0.682358  
5 A13 0.822302  
6 A14 0.176791  
7 A0_b 0.172178  
8 A0_a -0.12165  
9 A3_y -0.285135  
10 A3_u 0.335663  
11 A4_p -0.285135  
12 A4_g 0.335663  
13 A5_k 0.0358046  
14 A5_cc 0  
15 A5_d 0.0480538  
16 A5_c 0.430725  
17 A5_aa 0  
18 A5_m -0.332389  
19 A5_q 0.524153  
20 A5_w -0.599829  
21 A5_e 0.0257252  
22 A5_ff -0.359011  
23 A5_j -0.119432  
24 A5_x 0.0494795  
25 A5_i 0  
26 A6_v -0.387493  
27 A6_h 0.0415697  
28 A6_bb 0.0604108  
29 A6_z 0  
30 A6_ff -0.372217  
31 A6_j 0.538139  
32 A8_t 0.9259  
33 A8-f -0.875372  
34 A9_t 0  
35 A9_f 0  
36 A11_t 0.197957  
37 A11_f -0.146928  
38 A12_g -0.221414  
39 A12_s 0.272506  

Example: TD_GLM Using Housing Data

This example takes raw housing data, and does the following:
  1. Uses TD_ScaleFit to standardize the data.
  2. Uses TD_ScaleTransform to transform the data.
  3. Uses TD_GLM to get a model.

Raw Housing Data

ID MecInc HouseAge AveRoom AveBedrms Population AveOccup Latitude Longitude MedHouseVal
2833 1.3527 30 2.24754 0.742574 169 1.67327 35.39 -119.02 0.6
5328 2.7679 23 3.03868 1.06446 2031 1.63658 34.04 -118.45 2.775
5300 1.583 19 3.14815 1.04548 3751 2.4373 34.07 -118.45 3.5
12433 1.7344 24 3.2984 1.05856 4042 4.4663 33.51 -116.01 0.664
... ... ... ... ... ... ... ... ... ...

TD_ScaleFit Call for Housing Data

SELECT * FROM TD_ScaleFit(
    ON cal_housing_ex_raw AS InputTable
    OUT VOLATILE TABLE OutputTable(scaleFitOut_cal_ex)
    USING
    TargetColumns('medinc', 'houseage', 'averooms', 'avebedrms', 'population', 'aveoccup', 'latitude', 'longitude')
    ScaleMethod('STD')
) AS dt2;

TD_ScaleFit Output for Housing Data

TD_STATTYPE_SCLFIT MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude Longitude
min 1.0472 7 2.24752 0.742574 47 1.63658 32.64 -123.35
max 10.7721 52 8.89305 2.56522 4145 5.45536 40.99 -116.01
sum 261.401 2134 349.412 74.0563 91566 200.592 2459.39 -8256.05
count 69 69 69 69 69 69 69 69
null 0 0 0 0 0 0 0 0
avg 3.78842 30.9275 1.07328 1.07328 1327.04 2.90714 2.90714 -119.653
variance 5.40242 142.803 1.55253 0.0458083 861633 0.618171 0.618171 4.73921
std 2.30741 11.8631 1.23694 0.212472 921.491 0.780521 0.780521 2.16114
ustd 2.32431 11.95 1.246 0.214029 928.242 0.786239 0.786239 2.17697
multiplier 1 1 1 1 1 1 1 1
intercept 0 0 0 0 0 0 0 0
location 3.78842 30.9275 5.06394 1.07328 1327.04 2.90714 2.90714 -119.653
scale 2.30741 11.8631 1.23694 0.212472 921.491 0.780521 0.780521 2.16114
globalscale_false nan nan nan nan nan nan nan nan
ScaleMethodNumberMapping: [0:mean, 1:sum, 2:ustd, 3:std, 4:range, 5:midrange,6:maxabs, 7:rescale] 3 3 3 3 3 3 3 3
missvalue_keep nan nan nan nan nan nan nan nan

TD_ScaleTransform Call for Housing Data

CREATE MULTISET TABLE cal_housing_ex_scaled AS (
    SELECT * FROM TD_ScaleTransform(
        ON cal_housing_ex_raw AS InputTable
        ON scaleFitOut_cal_ex AS FitTable DIMENSION
        USING
        accumulate('id', 'MedHouseVal')
    ) AS dt1
) WITH data;

TD_ScaleTransform Output for Housing Data

ID MedHouseVal MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude Longitude
244 1.117 -0.605796 1.10194 -0.160367 0.426688 1.02221 1.04102 0.946201 -1.18785
670 1.922 -0.00308521 0.427582 -0.129699 -0.530672 -0.761856 -0.21256 0.906345 -1.16008
686 1.578 -0.152084 -0.0781865 -0.625426 -0.513581 -0.685892 -0.533101 1.01705 -1.14157
1754 1.651 -0.0263147 0.596172 0.454207 -0.0272726 0.0683203 -0.0827654 1.01705 -1.23412
... ... ... ... ... ... ... ... ... ...

TD_GLM Call for Housing Data

CREATE VOLATILE TABLE td_glm_cal_ex AS (
    SELECT * from TD_GLM (
        ON cal_housing_ex_scaled AS InputTable
        USING
        InputColumns('medinc', 'houseage', 'averooms', 'avebedrms', 'population', 'aveoccup', 'latitude', 'longitude')
        ResponseColumn('MedHouseVal')
        Family('Gaussian')
        BatchSize(10)
        MaxIterNum(300)
        RegularizationLambda(0.02)
        Alpha(0.15)
        IterNumNoChange(50)
        Intercept('true')
        LearningRate('invtime')
        InitialEta(0.05)
        Momentum(0)
        Nesterov('false')
        LocalSGDIterations(0)
    ) AS dt
) WITH DATA
ON COMMIT PRESERVE ROWS;

TD_GLM Output for Housing Data

Attribute Predictor Estimate Value
-13 LocalSGD Iterations 0  
-12 Nesterov    
-11 Momentum 0  
-10 Learning Rate (Final) 0.0133974  
-9 Learning Rate (Initial) 0.05  
-8 Number of Iterations 194 CONVERGED
-7 Alpha 0.15 Elasticnet
-6 Regularization 0.02 ENABLE
-5 BIC -67.6236  
-4 AIC -87.7305  
-3 Number of Observations 69  
-2 MSE 0.216033  
-1 Loss Function   SQUARED_ERROR
0 (Intercept) 2.07174  
1 MedInc 0.782883  
2 HouseAge 0.231914  
3 AveRooms 0.0619822  
4 AveBedrms -0.113656  
5 Population 0.211336  
6 AveOccup -0.388201  
7 Latitude -0.195511  
8 Longitude -0.193884  

Example: TD_GLM Call of Housing Data Using Forward Selection

SELECT * FROM TD_GLM (
ON cal_housing_ex_raw AS InputTable
USING
  InputColumns('[1:4]','[7:10]')
  ResponseColumn('price')
  Family('Gaussian')
  LearningRate('optimal')
  RegularizationLambda(0.02)
  Alpha(0.33)
  BatchSize(10)
  MaxIterNum (36)
  IterNumNoChange(100)
  Tolerance(0.0001)
  InitialEta(0.02)
  StepwiseDirection('forward')
  MaxStepsNum(10)
) as dt

TD_GLM Output of Housing Data Using Forward Selection

Attribute Predictor Estimate Value
1 RAD -3.778  
0 (Intercept) 23.16449  
-1 Loss Function SQUARED_ERROR  
-2 MSE 47.409  
-3 Number of Observations 360  
-4 AIC 1393.174  
-5 BIC 1400.946  
-6 Regularization 0.02 ENABLED
-7 Alpha 0.33 Elasticnet
-8 Number of Iterations 36 NOT CONVERGED
-9 Learning Rate (Initial) 0.02  
-10 Learning Rate (Final) 0.929316  
-11 Momentum 0  
-12 Nesterov   FALSE
-13 LocalSGD Iteratons 0  

Example: TD_GLM Call of Housing Data Using Backward Deletion

SELECT * FROM TD_GLM (
ON cal_housing_ex_raw AS InputTable
OUT TABLE MetaInformationTable(logtable)
USING
  InputColumns('[3:6]')
  ResponseColumn('price')
  Family('Gaussian')
  LearningRate('optimal')
  RegularizationLambda(0.02)
  Alpha(0.33)
  BatchSize(10)
  MaxIterNum (36)
  IterNumNoChange(100)
  Tolerance(0.0001)
  InitialEta(0.02)
  StepwiseDirection('backward')
  MaxStepsNum(10)
) as dt;

TD_GLM Output of Housing Data Using Backward Deletion

Attribute Predictor Estimate Value
0 (Intercept) 23.14352  
1 NOX -1.85727  
2 RM 8.010339  
-1 Loss Function SQUARED_ERROR  
-2 MSE 11.67791  
-3 Number of Observations 360  
-4 AIC 890.7715  
-5 BIC 902.4298  
-6 Regularization 0.02 ENABLED
-7 Alpha 0.33 Elasticnet
-8 Number of Iterations 36 NOT CONVERGED
-9 Learning Rate (Initial) 0.02  
-10 Learning Rate (Final) 0.929316  
-11 Momentum 0  
-12 Nesterov   FALSE
-13 LocalSGD Iteratons 0  

Example: Regression Using One Table

The following examples use this input table. Table name is housing_train_segment.
sn price lotsize bedrooms bathrms stories driveway recroom fullbase gashw airco garagepl prefarea homestyle partition_id
1 42000 5850 3 1 2 1 0 1 0 0 1 0 0 34
2 38500 4000 2 1 1 1 0 0 0 0 0 0 0 34
3 49500 3060 3 1 1 1 0 0 0 0 0 0 0 34
4 60500 6650 3 1 2 1 1 0 0 0 0 0 1 34
... ... ... ... ... ... ... ... ... ... ... ... ... ...

TD_GLM Call with Family: Gaussian for Input Table

SELECT * FROM TD_GLM (
  ON housing_train_segment AS InputTable PARTITION BY partition_id ORDER BY sn
  USING
    Family('Gaussian')
    InputColumns('[3:10]')
    ResponseColumn('price')
    BatchSize(10)
    MaxIterNum (1000)
    IsDebug('true')
) AS dt;

TD_GLM Output with Family: Gaussian for Input Table

partition_id attribute predictor estimate value
31 0 (intercept) 4777.491933 -
31 1 bedrooms 11243.33906 -
31 2 bathrms 7426.162028 -
31 3 stories 6897.217893 -
... ... ... ... ...

Example: Using an Attribute Table

Attribute Table
partition_id Attribute Column
5 bedrooms1
31 gashw
37 bathrms1
37 recroom1

TD_GLM Call with Family: Gaussian for Input Table and Attribute Table

SELECT * FROM TD_GLM (
  ON housing_train_segment AS InputTable PARTITION BY partition_id 
  ORDER BY sn
  ON housing_train_attribute AS AttributeTable PARTITION BY partition_id
  USING
    Family('Gaussian')
    InputColumns('[3:10]')
    ResponseColumn('price')
    MaxIterNum (100)
) AS dt;

TD_GLM Output for Attribute Table

parttion_id31 Attribute Predictor Estimate Value
31 0 (intercept) 48060.8685 -
31 1 gashw 2532.68146 -
31 -1 Loss Function - SQUARED
31 -3 Number of Observations 115 -
31 -2 MSE 39290745.1 -
31 -4 AIC 2014.94745 -
31 -5 BIC 2020.43731 -
31 -6 Regularization 0.02 ENABLED
31 -7 Alpha 0.15 Elasticnet
31 -8 Number of Iterations 100 NOT CONVERGED
31 --9 Learning Rate (initial) 0.05 -
31 -10 Learning Rate (final) 0.01581139 -
31 -11 Momentum 0 -
31 -12 Nesterov - FALSE

Example: Using a Parameter Table

Parameter Table
Partition_id Parameter_column Value_column
31 Alpha1 0.5

TD_GLM Call with Family: Gaussian for Input Table and Parameter Table

SELECT * FROM TD_GLM (
  ON housing_train_segment AS InputTable PARTITION BY partition_id
  ORDER BY sn
  ON housing_train_parameter AS ParameterTable PARTITION BY partition_id
  USING
    Family('Gaussian')
    InputColumns('[3:10]')
    ResponseColumn('price')
    MaxIterNum (100)
) AS dt;

TD_GLM Output for Parameter Table

Partition_id Attribute Predictor Estimate Value
31 - - - Invalid parameter in parameter table. Found: Alpha1 in partition(31)
32 0 (intercept) 4830.20162 -
32 1 bedrooms 8167.57498 -
... ... ... ... ...

Example: Classification Using TD_GLM Call with Family: Binomial for Input Table and Attribute Table

SELECT * FROM TD_GLM (
  ON housing_train_segment AS InputTable PARTITION BY partition_id
  ON housing_train_attribute AS AttributeTable PARTITION BY partition_id
  USING
    Family('Binomial')
    InputColumns('[3:10]')
    ResponseColumn('homestyle')
    MaxIterNum (100)
) AS dt;

TD_GLM Output with Family: Binomial for Input Table and Attribute Table

partition_id attribute predictor estimate value
31 0 (intercept) -0.308286246 -
31 1 gashw 0.07012385 -
31 -1 Loss Function - LOG
31 -3 Number of Observations 134 -
... ... ... ... ...