Using GLM Model with teradataml Package - Using GLM Model with teradataml Package - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
Language
English (United States)
Last Update
2024-12-18
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage
This example shows the steps to build a Generalized Linear model (GLM) and then apply the model to the new testing admissions data. The data set contains two classes, where one class represents the successful admission while the other represents no admission.
This example uses Vantage Analytic Library (VAL) functions. You must make sure VAL functions are installed in Vantage before using VAL functions.
  1. Import required libraries.
    from teradataml import GLM
    from teradataml import TDGLMPredict
    from teradataml.dataframe.dataframe import DataFrame
  2. Create training data.
    1. If the input table (admissions_train) does not exist already, create the table and load the dataset into the table.
      load_example_data("dataframe", "admissions_train")
    2. Create a teradataml DataFrame for the training dataset from "admissions_train" table.
      admissions_train = DataFrame.from_table("admissions_train")
  3. Convert categorical columns in the input data table into numeric columns using OneHotEncoder() and Transform() functions from VAL, as GLM() function supports only numeric columns.
    1. Import required libraries.
      from teradataml import valib, OneHotEncoder, Retain
    2. Configure VAL install location.
      configure.val_install_location = "VAL"
    3. Define encoders for categorical columns.
      masters_code = OneHotEncoder(values=["yes", "no"],
                                   columns="masters",
                                   out_columns="masters")
      stats_code = OneHotEncoder(values=["Advanced", "Novice"],
                                 columns="stats",
                                 out_columns="stats")
      programming_code = OneHotEncoder(values=["Advanced", "Novice", "Beginner"],
                                       columns="programming",
                                       out_columns="programming")
    4. Retain numerical columns.
      retain = Retain(columns=["admitted", "gpa"])
    5. Transform categorical columns to numeric columns.
      all_numeruc_admissions_train = valib.Transform(data=admissions_train,
                                                     one_hot_encode=[masters_code, stats_code, programming_code],
                                                     retain=retain)
  4. Train a new Generalized Linear Model (GLM) based on the teradataml DataFrame from the training dataset, using the train function - GLM() function from teradataml package.
    glm_train = GLM(formula="admitted ~ gpa + yes_masters + no_masters + Advanced_stats + Novice_stats + Advanced_programming + Novice_programming + Beginner_programming",
                    data=all_numeruc_admissions_train.result,
                    learning_rate="INVTIME",
                    momentum=0.80)
    Next, apply the model to the test data using TDGLMPredict() function.
  5. Create test data.
    1. If the input table (admissions_test) does not exist already, create the table and load the dataset into the table.
      load_example_data("GLMPredict", "admissions_test")
    2. Create a teradataml DataFrame for the test dataset from "admissions_test" table.
      admissions_test = DataFrame.from_table("admissions_test")
  6. Convert categorical columns in test input data table into numeric columns using OneHotEncoder() and Transform() functions from VAL, as GLM() function has created the model on "all_numeric_admissions_train" table having only numeric columns.
    1. Import required libraries.
      from teradataml import valib, OneHotEncoder, Retain
    2. Configure VAL install location.
      configure.val_install_location = "VAL"
    3. Define encoders for categorical columns.
      masters_code = OneHotEncoder(values=["yes", "no"],
                                   columns="masters",
                                   out_columns="masters")
      stats_code = OneHotEncoder(values=["Advanced", "Novice"],
                                 columns="stats",
                                 out_columns="stats")
      programming_code = OneHotEncoder(values=["Advanced", "Novice", "Beginner"],
                                       columns="programming",
                                       out_columns="programming")
    4. Retain numerical columns.
      retain = Retain(columns=["admitted", "gpa"])
    5. Transform categorical columns to numeric columns.
      all_numeric_admissions_test= valib.Transform(data=admissions_test,
                                                   one_hot_encode=[masters_code, stats_code, programming_code],
                                                   retain=retain)
  7. Predict the admission status by applying the Generalized Linear Model (GLM) to the teradataml DataFrame from the test dataset, using the TDGLMPredict() function and output of the train function.
    tdglmpredict_out = TDGLMPredict(object= glm_train.result,
                                    newdata= all_numeric_admissions_test.result,
                                    id_column="id")
  8. Inspect the results.
    tdglmpredict_out.result