Using SageMaker Linear Learner with tdapiclient | API Integration - Using SageMaker Linear Learner with tdapiclient

This use case shows the steps to use SageMaker Linear Learner with tdapiclient.

You can download the aws-usecases.zip file in the attachment as a reference. The linearlearner folder in the zip file includes a Jupyter notebook file (ipynb) and a data file (csv) containing the dataset required to run this use case.

Import necessary libraries.

import getpass
import sagemaker
from tdapiclient import create_tdapi_context, TDApiClient
from teradataml import create_context, DataFrame, copy_to_sql,load_example_data
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from teradatasqlalchemy.types import *

Create the connection.

host = input("Host: ")
username = input("Username: ")
password = getpass.getpass("Password: ")

td_context = create_context(host=host, username=username, password=password)

Create TDAPI context and TDApiClient object.

s3_bucket = input("S3 Bucket(Please provide just the bucket name): ")
access_id = input("Access ID:")
access_key = getpass.getpass("Acess Key: ")
region = input("AWS Region: ")

os.environ["AWS_ACCESS_KEY_ID"] = access_id
os.environ["AWS_SECRET_ACCESS_KEY"] = access_key
os.environ["AWS_REGION"] = region

tdapi_context = create_tdapi_context("aws", bucket_name=s3_bucket)

td_apiclient = TDApiClient(tdapi_context)

Set bucket locations.

# Bucket location where your custom code will be saved in the tar.gz format.
custom_code_upload_location = "s3://{}/LinearLearner/code".format(s3_bucket)

# Bucket location where results of model training are saved.
model_artifacts_location = "s3://{}/LinearLearner/artifacts".format(s3_bucket)

Set up data to be used for this workflow.

Read the wine quality dataset.

df = pd.read_csv ("winequality-red.csv")

df.head()

The output:

	fixed acidity	volatile acidity	citric acid	residual sugar	chlorides	free sulfur dioxide	total sulfur dioxide	density	pH	sulphates	alcohol	quality
0	7.4	0.70	0.00	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	5
1	7.8	0.88	0.00	2.6	0.098	25.0	67.0	0.9968	3.20	0.68	9.8	5
2	7.8	0.76	0.04	2.3	0.092	15.0	54.0	0.9970	3.26	0.65	9.8	5
3	11.2	0.28	0.56	1.9	0.075	17.0	60.0	0.9980	3.16	0.58	9.8	6
4	7.4	0.70	0.00	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	5

Rename columns for creating teradataml DataFrame.

df.rename(columns={'fixed acidity':'fixed_acidity',
                         'citric acid':'citric_acid' ,
                         'residual sugar':'residual_sugar',
                         'volatile acidity': 'acidity_volatile',
                         'free sulfur dioxide': 'free_sulfur_dioxide',
                         'total sulfur dioxide':'total_sulfur_dioxide'}, 
                inplace=True)

Insert the dataframe in the table.

data_table="wine_quality_data"

column_types ={'fixed_acidity': FLOAT, 
               'acidity_volatile':FLOAT, 
               'citric_acid':FLOAT,
               'residual_sugar':FLOAT, 
               'chlorides':FLOAT, 
               'free_sulfur_dioxide':FLOAT,
               'total_sulfur_dioxide':FLOAT, 
               'density':FLOAT, 
               'pH':FLOAT, 
               'sulphates':FLOAT, 
               'alcohol':FLOAT,
               'quality':INTEGER}

copy_to_sql(df=df, table_name=data_table, if_exists="replace", types=column_types)

Create a teradataml DataFrame using the table.
```
data = DataFrame(table_name=data_table)
```

Data cleaning.

Categorize target variable that is 'quality' into 'good' or 'bad'.
If the quality value is greater than 6.5, then the quality is considered 'good' (1); if quality value is less or equal to 6.5, it is represented as 'bad' (0).
```
from teradataml.dataframe.sql_functions import case
```
```
quality = data['quality']
```
```
data_updated = data.assign(grade = case([(quality > 6.5, 1),
                                         (quality<6.5, 0)]))
```

Drop unnecessary columns.

data_updated=data_updated.drop("quality",axis=1)

data

The output:

fixed_acidity	acidity_volatile	citric_acid	residual_sugar	chlorides	free_sulfur_dioxide	total_sulfur_dioxide	density	pH	sulphates	alcohol	grade
7.8	0.76	0.04	2.3	0.092	15.0	54.0	0.997	3.26	0.65	9.8	0
7.4	0.7	0.0	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	0
7.4	0.66	0.0	1.8	0.075	13.0	40.0	0.9978	3.51	0.56	9.4	0
7.9	0.6	0.06	1.6	0.069	15.0	59.0	0.9964	3.3	0.46	9.4	0
7.8	0.58	0.02	2.0	0.073	9.0	18.0	0.9968	3.36	0.57	9.5	1
7.5	0.5	0.36	6.1	0.071	17.0	102.0	0.9978	3.35	0.8	10.5	0
7.3	0.65	0.0	1.2	0.065	15.0	21.0	0.9946	3.39	0.47	10.0	1
11.2	0.28	0.56	1.9	0.075	17.0	60.0	0.998	3.16	0.58	9.8	0
7.8	0.88	0.0	2.6	0.098	25.0	67.0	0.9968	3.2	0.68	9.8	0
7.4	0.7	0.0	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	0

Create two samples of input data: sample 1 has 80% of total rows and sample 2 has 20% of total rows.
```
wine_sample = data_updated.sample(frac=[0.8, 0.2])
```

Create train dataset from sample 1 by filtering on "sampleid" and drop "sampleid" column as it is not required for training model.

wine_train = wine_sample[wine_sample.sampleid == "1"].drop("sampleid", axis = 1)

wine_train

The output:

fixed_acidity	acidity_volatile	citric_acid	residual_sugar	chlorides	free_sulfur_dioxide	total_sulfur_dioxide	density	pH	sulphates	alcohol	grade
7.4	0.7	0.0	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	0
7.9	0.6	0.06	1.6	0.069	15.0	59.0	0.9964	3.3	0.46	9.4	0
7.3	0.65	0.0	1.2	0.065	15.0	21.0	0.9946	3.39	0.47	10.0	1
7.8	0.58	0.02	2.0	0.073	9.0	18.0	0.9968	3.36	0.57	9.5	1
5.6	0.615	0.0	1.6	0.089	16.0	59.0	0.9943	3.58	0.52	9.9	0
7.8	0.61	0.29	1.6	0.114	9.0	29.0	0.9974	3.26	1.56	9.1	0
6.7	0.58	0.08	1.8	0.0969999999999999	15.0	65.0	0.9959	3.28	0.54	9.2	0
7.4	0.66	0.0	1.8	0.075	13.0	40.0	0.9978	3.51	0.56	9.4	0
11.2	0.28	0.56	1.9	0.075	17.0	60.0	0.998	3.16	0.58	9.8	0
7.8	0.76	0.04	2.3	0.092	15.0	54.0	0.997	3.26	0.65	9.8	0

Create test dataset from sample 2 by filtering on "sampleid" and drop "sampleid" column as it is not required for scoring.

wine_test = wine_sample[wine_sample.sampleid == "2"].drop("sampleid", axis = 1)

wine_test

The output:

fixed_acidity	acidity_volatile	citric_acid	residual_sugar	chlorides	free_sulfur_dioxide	total_sulfur_dioxide	density	pH	sulphates	alcohol	grade
7.4	0.66	0.0	1.8	0.075	13.0	40.0	0.9978	3.51	0.56	9.4	0
7.5	0.5	0.36	6.1	0.071	17.0	102.0	0.9978	3.35	0.8	10.5	0
7.5	0.5	0.36	6.1	0.071	17.0	102.0	0.9978	3.35	0.8	10.5	0
8.9	0.62	0.19	3.9	0.17	51.0	148.0	0.9986	3.17	0.93	9.2	0
6.3	0.39	0.16	1.4	0.08	11.0	23.0	0.9955	3.34	0.56	9.3	0
7.8	0.645	0.0	2.0	0.0819999999999999	8.0	16.0	0.9964	3.38	0.59	9.8	0
8.1	0.56	0.28	1.7	0.368	16.0	56.0	0.9968	3.11	1.28	9.3	0
7.9	0.6	0.06	1.6	0.069	15.0	59.0	0.9964	3.3	0.46	9.4	0
7.4	0.7	0.0	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	0
11.2	0.28	0.56	1.9	0.075	17.0	60.0	0.998	3.16	0.58	9.8	0

Convert teradataml DataFrame to NumPy ndarray.

train_data=wine_train.drop('grade',axis=1).get_values()
label_train=wine_train.get_values()
label_train=label_train[:,11]

label_train=label_train.astype('float32') 
train_data=train_data.astype('float32')

Create Linear Learner SageMaker estimator instance through tdapiclient.

exec_role_arn = "arn:aws:iam::076782961461:role/service-role/AmazonSageMaker-ExecutionRole-20210112T215668"

linear_learner = td_apiclient.LinearLearner(
          role=exec_role_arn,
          instance_count=1,
          instance_type="ml.m5.large",
          predictor_type='binary_classifier',
          output_path=model_artifacts_location,
          code_location=custom_code_upload_location,
          epochs=5
)

Convert NumPy ndarray to RecordSet object which will be passed to fit method.

training_data_recordset = linear_learner.record_set(train=train_data,
                                                    labels=label_train)

Start training using RecordSet objects.

linear_learner.fit(training_data_recordset)

Create Serializer and Deserializer, so predictor can handle CSV input and output.

from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import CSVDeserializer
csv_ser = CSVSerializer()
csv_dser = CSVDeserializer()

predictor = linear_learner.deploy("aws-endpoint",
                                  sagemaker_kw_args={"instance_type": "ml.m5.large", "initial_instance_count": 1, "serializer": csv_ser, "deserializer": csv_dser})

Try prediction integration using the predictor object created in previous step.

Confirm that predictor is correctly configured for accepting csv input.
```
print(predictor.cloudObj.accept)
```
The output:
```
('text/csv',)
```
Prepare test dataset by dropping target variable 'grade'.
```
wine_test=wine_test.drop("grade",axis=1)
```

Try prediction with UDF and Client options.

Prediction with UDF option:

output = predictor.predict(wine_test, mode="UDF",content_type='csv')

output

The output:

fixed_acidity	acidity_volatile	citric_acid	residual_sugar	chlorides	free_sulfur_dioxide	total_sulfur_dioxide	density	pH	sulphates	alcohol	Output
7.8	0.61	0.29	1.6	0.114	9.0	29.0	0.9974	3.26	1.56	9.1	{"predictions": [{"score": 0.06059710681438446, "predicted_label": 0}]}
7.4	0.59	0.08	4.4	0.086	6.0	29.0	0.9974	3.38	0.5	9.0	{"predictions": [{"score": 0.061050355434417725, "predicted_label": 0}]}
7.9	0.43	0.21	1.6	0.106	10.0	37.0	0.9966	3.17	0.91	9.5	{"predictions": [{"score": 0.07426965236663818, "predicted_label": 0}]}
7.8	0.645	0.0	2.0	0.0819999999999999	8.0	16.0	0.9964	3.38	0.59	9.8	{"predictions": [{"score": 0.07694132626056671, "predicted_label": 0}]}
7.8	0.59	0.18	2.3	0.076	17.0	54.0	0.9975	3.43	0.59	10.0	{"predictions": [{"score": 0.056873418390750885, "predicted_label": 0}]}
8.8	0.4	0.4	2.2	0.079	19.0	52.0	0.998	3.44	0.64	9.2	{"predictions": [{"score": 0.05504496023058891, "predicted_label": 0}]}
7.8	0.645	0.0	5.5	0.086	5.0	18.0	0.9986	3.4	0.55	9.6	{"predictions": [{"score": 0.062305182218551636, "predicted_label": 0}]}
8.9	0.62	0.19	3.9	0.17	51.0	148.0	0.9986	3.17	0.93	9.2	{"predictions": [{"score": 0.021687734872102737, "predicted_label": 0}]}
6.7	0.58	0.08	1.8	0.0969999999999999	15.0	65.0	0.9959	3.28	0.54	9.2	{"predictions": [{"score": 0.05492998659610748, "predicted_label": 0}]}
11.2	0.28	0.56	1.9	0.075	17.0	60.0	0.998	3.16	0.58	9.8	{"predictions": [{"score": 0.07007729262113571, "predicted_label": 0}]}

Prediction with Client option:

item=wine_test.tail(1)
output = predictor.predict(item, mode="client",content_type='csv')

output

The output:

[['0']]

Clean up.

predictor.cloudObj.delete_model()
predictor.cloudObj.delete_endpoint()
remove_tdapi_context(tdapi_context)

Using SageMaker Linear Learner with tdapiclient | API Integration - Using SageMaker Linear Learner with tdapiclient - Teradata Vantage

Teradata Vantage™ - API Integration Guide for Cloud Machine Learning