This example shows the steps to build a Generalized Linear model (GLM) and then apply the model to the new testing admissions data. The data set contains two classes, where one class represents the successful admission while the other represents no admission.
This example uses Vantage Analytic Library (VAL) functions. You must make sure VAL functions are installed in Vantage before using VAL functions.
- Import required libraries.
from teradataml import GLM from teradataml import TDGLMPredict from teradataml.dataframe.dataframe import DataFrame
- Create training data.
- If the input table (admissions_train) does not exist already, create the table and load the dataset into the table.
load_example_data("dataframe", "admissions_train")
- Create a teradataml DataFrame for the training dataset from "admissions_train" table.
admissions_train = DataFrame.from_table("admissions_train")
- If the input table (admissions_train) does not exist already, create the table and load the dataset into the table.
- Convert categorical columns in the input data table into numeric columns using OneHotEncoder() and Transform() functions from VAL, as GLM() function supports only numeric columns.
- Import required libraries.
from teradataml import valib, OneHotEncoder, Retain
- Configure VAL install location.
configure.val_install_location = "VAL"
- Define encoders for categorical columns.
masters_code = OneHotEncoder(values=["yes", "no"], columns="masters", out_columns="masters") stats_code = OneHotEncoder(values=["Advanced", "Novice"], columns="stats", out_columns="stats") programming_code = OneHotEncoder(values=["Advanced", "Novice", "Beginner"], columns="programming", out_columns="programming")
- Retain numerical columns.
retain = Retain(columns=["admitted", "gpa"])
- Transform categorical columns to numeric columns.
all_numeruc_admissions_train = valib.Transform(data=admissions_train, one_hot_encode=[masters_code, stats_code, programming_code], retain=retain)
- Import required libraries.
- Train a new Generalized Linear Model (GLM) based on the teradataml DataFrame from the training dataset, using the train function - GLM() function from teradataml package.
glm_train = GLM(formula="admitted ~ gpa + yes_masters + no_masters + Advanced_stats + Novice_stats + Advanced_programming + Novice_programming + Beginner_programming", data=all_numeruc_admissions_train.result, learning_rate="INVTIME", momentum=0.80)
Next, apply the model to the test data using TDGLMPredict() function. - Create test data.
- If the input table (admissions_test) does not exist already, create the table and load the dataset into the table.
load_example_data("GLMPredict", "admissions_test")
- Create a teradataml DataFrame for the test dataset from "admissions_test" table.
admissions_test = DataFrame.from_table("admissions_test")
- If the input table (admissions_test) does not exist already, create the table and load the dataset into the table.
- Convert categorical columns in test input data table into numeric columns using OneHotEncoder() and Transform() functions from VAL, as GLM() function has created the model on "all_numeric_admissions_train" table having only numeric columns.
- Import required libraries.
from teradataml import valib, OneHotEncoder, Retain
- Configure VAL install location.
configure.val_install_location = "VAL"
- Define encoders for categorical columns.
masters_code = OneHotEncoder(values=["yes", "no"], columns="masters", out_columns="masters") stats_code = OneHotEncoder(values=["Advanced", "Novice"], columns="stats", out_columns="stats") programming_code = OneHotEncoder(values=["Advanced", "Novice", "Beginner"], columns="programming", out_columns="programming")
- Retain numerical columns.
retain = Retain(columns=["admitted", "gpa"])
- Transform categorical columns to numeric columns.
all_numeric_admissions_test= valib.Transform(data=admissions_test, one_hot_encode=[masters_code, stats_code, programming_code], retain=retain)
- Import required libraries.
- Predict the admission status by applying the Generalized Linear Model (GLM) to the teradataml DataFrame from the test dataset, using the TDGLMPredict() function and output of the train function.
tdglmpredict_out = TDGLMPredict(object= glm_train.result, newdata= all_numeric_admissions_test.result, id_column="id")
- Inspect the results.
tdglmpredict_out.result