Using GLM Model with teradataml Package - Using GLM Model with teradataml Package - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905
This example shows the steps to build a Generalized Linear model (GLM) and then apply the model to the new testing admissions data. The data set contains two classes, where one class represents the successful admission while the other represents no admission.
This example uses Vantage Analytic Library (VAL) functions. You must make sure VAL functions are installed in Vantage before using VAL functions.
  1. Import required libraries.
    from teradataml import GLM
    from teradataml import TDGLMPredict
    from teradataml.dataframe.dataframe import DataFrame
  2. Create training data.
    1. If the input table (admissions_train) does not exist already, create the table and load the dataset into the table.
      load_example_data("dataframe", "admissions_train")
    2. Create a teradataml DataFrame for the training dataset from "admissions_train" table.
      admissions_train = DataFrame.from_table("admissions_train")
  3. Convert categorical columns in the input data table into numeric columns using OneHotEncoder() and Transform() functions from VAL, as GLM() function supports only numeric columns.
    1. Import required libraries.
      from teradataml import valib, OneHotEncoder, Retain
    2. Configure VAL install location.
      configure.val_install_location = "VAL"
    3. Define encoders for categorical columns.
      masters_code = OneHotEncoder(values=["yes", "no"],
                                   columns="masters",
                                   out_columns="masters")
      stats_code = OneHotEncoder(values=["Advanced", "Novice"],
                                 columns="stats",
                                 out_columns="stats")
      programming_code = OneHotEncoder(values=["Advanced", "Novice", "Beginner"],
                                       columns="programming",
                                       out_columns="programming")
    4. Retain numerical columns.
      retain = Retain(columns=["admitted", "gpa"])
    5. Transform categorical columns to numeric columns.
      all_numeruc_admissions_train = valib.Transform(data=admissions_train,
                                                     one_hot_encode=[masters_code, stats_code, programming_code],
                                                     retain=retain)
  4. Train a new Generalized Linear Model (GLM) based on the teradataml DataFrame from the training dataset, using the train function - GLM() function from teradataml package.
    glm_train = GLM(formula="admitted ~ gpa + yes_masters + no_masters + Advanced_stats + Novice_stats + Advanced_programming + Novice_programming + Beginner_programming",
                    data=all_numeruc_admissions_train.result,
                    learning_rate="INVTIME",
                    momentum=0.80)
    Next, apply the model to the test data using TDGLMPredict() function.
  5. Create test data.
    1. If the input table (admissions_test) does not exist already, create the table and load the dataset into the table.
      load_example_data("GLMPredict", "admissions_test")
    2. Create a teradataml DataFrame for the test dataset from "admissions_test" table.
      admissions_test = DataFrame.from_table("admissions_test")
  6. Convert categorical columns in test input data table into numeric columns using OneHotEncoder() and Transform() functions from VAL, as GLM() function has created the model on "all_numeric_admissions_train" table having only numeric columns.
    1. Import required libraries.
      from teradataml import valib, OneHotEncoder, Retain
    2. Configure VAL install location.
      configure.val_install_location = "VAL"
    3. Define encoders for categorical columns.
      masters_code = OneHotEncoder(values=["yes", "no"],
                                   columns="masters",
                                   out_columns="masters")
      stats_code = OneHotEncoder(values=["Advanced", "Novice"],
                                 columns="stats",
                                 out_columns="stats")
      programming_code = OneHotEncoder(values=["Advanced", "Novice", "Beginner"],
                                       columns="programming",
                                       out_columns="programming")
    4. Retain numerical columns.
      retain = Retain(columns=["admitted", "gpa"])
    5. Transform categorical columns to numeric columns.
      all_numeric_admissions_test= valib.Transform(data=admissions_test,
                                                   one_hot_encode=[masters_code, stats_code, programming_code],
                                                   retain=retain)
  7. Predict the admission status by applying the Generalized Linear Model (GLM) to the teradataml DataFrame from the test dataset, using the TDGLMPredict() function and output of the train function.
    tdglmpredict_out = TDGLMPredict(object= glm_train.result,
                                    newdata= all_numeric_admissions_test.result,
                                    id_column="id")
  8. Inspect the results.
    tdglmpredict_out.result