Using SageMaker TensorFlow Estimator with tdapiclient | API Integration - Using SageMaker TensorFlow Estimator with tdapiclient

This use case shows the steps to use SageMaker TensorFlow Estimator with tdapiclient.

You can download the aws-usecases.zip file in the attachment as a reference. The tensorflow folder in the zip file includes a Jupyter notebook file (ipynb) and a Python file (py) required to run this notebook file.

Import required libraries.

import getpass
from tdapiclient import create_tdapi_context,TDApiClient
from teradataml import create_context, DataFrame, copy_to_sql
import pandas as pd
from teradatasqlalchemy.types import *

Create the connection.

host = input("Host: ")
username = input("Username: ")
password = getpass.getpass("Password: ")

td_context = create_context(host=host, username=username, password=password)

Create TDAPI context and TDApiClient object.

s3_bucket = input("S3 Bucket(Please provide just the bucket name, for example: test-bucket): ")
access_id = input("Access ID:")
access_key = getpass.getpass("Acess Key: ")
region = input("AWS Region: ")

os.environ["AWS_ACCESS_KEY_ID"] = access_id
os.environ["AWS_SECRET_ACCESS_KEY"] = access_key
os.environ["AWS_REGION"] = region

tdapi_context = create_tdapi_context("aws", bucket_name=s3_bucket)

td_apiclient = TDApiClient(tdapi_context)

Set up data.

Access California Housing dataset.

from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing

california_housing = fetch_california_housing(as_frame=True)
data=california_housing.frame

Insert the DataFrame in the tables.

data_table = "housing_data"

column_types = {"MedInc": FLOAT, "HouseAge": FLOAT,
                "AveRooms": FLOAT, "AveBedrms": FLOAT, "Population": FLOAT,
                "AveOccup": FLOAT, "Latitude": FLOAT, "Longitude": FLOAT,
                "MedHouseVal" : FLOAT}

copy_to_sql(df=data, table_name=data_table, if_exists="replace", types=column_types)

data = DataFrame(table_name=data_table)

Create two samples of input data: sample 1 has 80% of total rows and sample 2 has 20% of total rows.
```
housing_sample = data.sample(frac=[0.8, 0.2])
```

Create train dataset from sample 1 by filtering on "sampleid" and drop "sampleid" column as it is not required for training model.

housing_train = housing_sample[housing_sample.sampleid == "1"].drop("sampleid", axis = 1)

housing_train

The output:

MedInc	HouseAge	AveRooms	AveBedrms	  Population	AveOccup	Latitude	Longitude	MedHouseVal
7.2574	52.0	8.288135593220339	1.073446327683616	496.0	2.8022598870056497	37.85	-122.24	3.521
4.0368	52.0	4.761658031088083	1.1036269430051813	413.0	2.139896373056995	37.85	-122.25	2.697
3.6591	52.0	4.9319066147859925	0.9513618677042801	1094.0	2.1284046692607004	37.84	-122.25	2.992
3.12	52.0	4.797527047913447	1.061823802163833	1157.0	1.7882534775888717	37.84	-122.25	2.414
3.2031	52.0	5.477611940298507	1.0796019900497513	910.0	2.263681592039801	37.85	-122.26	2.815
3.2705	52.0	4.772479564032698	1.0245231607629428	1504.0	2.0490463215258856	37.85	-122.26	2.418
3.6912	52.0	4.970588235294118	0.9901960784313726	1551.0	2.172268907563025	37.84	-122.25	2.611
3.8462	52.0	6.281853281853282	1.0810810810810811	565.0	2.1814671814671813	37.85	-122.25	3.422
8.3014	21.0	6.238137082601054	0.9718804920913884	2401.0	2.109841827768014	37.86	-122.22	3.585
8.3252	41.0	6.984126984126984	1.0238095238095237	322.0	2.5555555555555554	37.88	-122.23	4.526

Create test dataset from sample 2 by filtering on "sampleid" and drop "sampleid" column as it is not required for scoring.

housing_test = housing_sample[housing_sample.sampleid == "2"].drop("sampleid", axis = 1)

housing_test

The output:

MedInc	HouseAge	AveRooms	AveBedrms	Population	AveOccup	Latitude	Longitude	MedHouseVal
2.6736	52.0	           4.0	1.0977011494252873	345.0	1.9827586206896552	37.84	-122.26	1.913
2.6	52.0	5.270142180094787	1.0355450236966826	1006.0	2.3838862559241707	37.84	-122.27	1.326
1.7969	48.0	5.737313432835821	1.2208955223880598	1026.0	3.062686567164179	37.84	-122.27	1.104
1.375	49.0	5.0303951367781155	1.1124620060790273	754.0	2.291793313069909	37.83	-122.27	1.049
1.4861	49.0	4.6022727272727275	1.0681818181818181	570.0	2.159090909090909	37.83	-122.27	0.972
1.0972	48.0	4.807486631016043	1.1550802139037433	987.0	2.6390374331550803	37.83	-122.27	1.045
2.7303	51.0	4.972014925373134	1.0708955223880596	1258.0	2.3470149253731343	37.83	-122.27	1.097
1.7135	42.0	4.478142076502732	1.0027322404371584	929.0	2.5382513661202184	37.85	-122.27	1.598
3.12	52.0	4.797527047913447	1.061823802163833	1157.0	1.7882534775888717	37.84	-122.25	2.414
4.0368	52.0	4.761658031088083	1.1036269430051813	413.0	2.139896373056995	37.85	-122.25	2.697

Create TensorFlow SageMaker instance through tdapiclient.

exec_role_arn = "arn:aws:iam::076782961461:role/service-role/AmazonSageMaker-ExecutionRole-20210112T215668"

TensorFlow = td_apiclient.TensorFlow(
    entry_point="script.py",
    role=exec_role_arn,
    instance_count=1,
    instance_type="ml.m5.xlarge",
    framework_version="2.2.0",
    py_version="py37",
)

Start training TensorFlow using teradataml DataFrame objects.

TensorFlow.fit({'train': housing_train,'validation':housing_test})

Create Serializer and Deserializer, so predictor can handle CSV input and output.

from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import CSVDeserializer
csv_ser = CSVSerializer()
csv_dser = CSVDeserializer()

predictor = TensorFlow.deploy("aws-endpoint",
                              sagemaker_kw_args={"instance_type": "ml.m5.large", "initial_instance_count": 1, "serializer": csv_ser, "deserializer": csv_dser})

Try prediction integration using teradataml DataFrame and the predictor object created in previous step.

Confirm that predictor is correctly configured for accepting csv input.
```
print(predictor.cloudObj.accept)
```
The output:
```
('text/csv',)
```

Prepare test dataset.

test=housing_test.drop("MedHouseVal",axis=1)

item1=test.head()
item2=test.tail()

Try prediction with UDF and Client options.

Prediction with UDF option:

output = predictor.predict(item1, mode="UDF",content_type='csv')

output

The output:

MedInc	HouseAge	AveRooms	AveBedrms	Population	AveOccup	Latitude	Longitude	Output
0.536	16.0	2.111111111111111	2.111111111111111	166.0	18.444444444444443	37.67	-121.04	{ "predictions": [[-1.15928912] ] }
0.6825	17.0	2.372549019607843	0.9901960784313726	198.0	0.9705882352941176	37.65	-121.0	{ "predictions": [[1.20473754] ] }
0.7025	19.0	2.425196850393701	1.125984251968504	1799.0	2.8330708661417323	35.3	-120.67	{ "predictions": [[1.5769695] ] }
0.7235	19.0	2.839622641509434	0.9551886792452831	844.0	1.990566037735849	34.1	-117.29	{ "predictions": [[1.33129275] ] }
0.75	52.0	2.823529411764706	0.9117647058823529	191.0	5.617647058823529	37.8	-122.28	{ "predictions": [[0.700169265] ] }
0.7714	16.0	2.698581560283688	1.0851063829787233	438.0	1.553191489361702	37.95	-121.29	{ "predictions": [[1.20444167] ] }
0.7473	22.0	3.116650987770461	1.1618062088428975	2381.0	2.239887111947319	34.03	-118.29	{ "predictions": [[1.86170757] ] }
0.536	4.0	14.0	3.3333333333333335	9.0	3.0	34.14	-116.76	{ "predictions": [[0.37429595] ] }
0.536	26.0	7.846153846153846	1.3076923076923077	43.0	3.3076923076923075	38.7	-122.52	{ "predictions": [[0.582466722] ] }
0.4999	16.0	21.63157894736842	6.0	26.0	1.368421052631579	39.42	-122.89	{ "predictions": [[0.406623662] ] }

Prediction with Client option:

output = predictor.predict(item2, mode="client",content_type='csv')

output

The output:

[['{'],
 ['    "predictions": [[7.58765793]',
  ' [6.9203043]',
  ' [7.83686399]',
  ' [7.64691257]',
  ' [7.59357548]',
  ' [7.44582224]',
  ' [7.47468281]',
  ' [7.68216419]',
  ' [7.43133402]',
  ' [7.75137663]'],
 ['    ]'],
 ['}']]

Clean up.

predictor.cloudObj.delete_model()
predictor.cloudObj.delete_endpoint()
remove_tdapi_context(tdapi_context)

Using SageMaker TensorFlow Estimator with tdapiclient | API Integration - Using SageMaker TensorFlow Estimator with tdapiclient - Teradata Vantage

Teradata Vantage™ - API Integration Guide for Cloud Machine Learning