This use case shows the steps to use SageMaker TensorFlow Estimator with tdapiclient.
You can download the aws-usecases.zip file in the attachment as a reference. The tensorflow folder in the zip file includes a Jupyter notebook file (ipynb) and a Python file (py) required to run this notebook file.
- Import required libraries.
import getpass from tdapiclient import create_tdapi_context,TDApiClient from teradataml import create_context, DataFrame, copy_to_sql import pandas as pd from teradatasqlalchemy.types import *
- Create the connection.
host = input("Host: ") username = input("Username: ") password = getpass.getpass("Password: ")
td_context = create_context(host=host, username=username, password=password)
- Create TDAPI context and TDApiClient object.
s3_bucket = input("S3 Bucket(Please provide just the bucket name, for example: test-bucket): ") access_id = input("Access ID:") access_key = getpass.getpass("Acess Key: ") region = input("AWS Region: ")
os.environ["AWS_ACCESS_KEY_ID"] = access_id os.environ["AWS_SECRET_ACCESS_KEY"] = access_key os.environ["AWS_REGION"] = region
tdapi_context = create_tdapi_context("aws", bucket_name=s3_bucket)
td_apiclient = TDApiClient(tdapi_context)
- Set up data.
- Access California Housing dataset.
from sklearn.model_selection import train_test_split from sklearn.datasets import fetch_california_housing
california_housing = fetch_california_housing(as_frame=True) data=california_housing.frame
- Insert the DataFrame in the tables.
data_table = "housing_data" column_types = {"MedInc": FLOAT, "HouseAge": FLOAT, "AveRooms": FLOAT, "AveBedrms": FLOAT, "Population": FLOAT, "AveOccup": FLOAT, "Latitude": FLOAT, "Longitude": FLOAT, "MedHouseVal" : FLOAT}
copy_to_sql(df=data, table_name=data_table, if_exists="replace", types=column_types)
data = DataFrame(table_name=data_table)
- Create two samples of input data: sample 1 has 80% of total rows and sample 2 has 20% of total rows.
housing_sample = data.sample(frac=[0.8, 0.2])
- Create train dataset from sample 1 by filtering on "sampleid" and drop "sampleid" column as it is not required for training model.
housing_train = housing_sample[housing_sample.sampleid == "1"].drop("sampleid", axis = 1)
housing_train
The output:MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude Longitude MedHouseVal 7.2574 52.0 8.288135593220339 1.073446327683616 496.0 2.8022598870056497 37.85 -122.24 3.521 4.0368 52.0 4.761658031088083 1.1036269430051813 413.0 2.139896373056995 37.85 -122.25 2.697 3.6591 52.0 4.9319066147859925 0.9513618677042801 1094.0 2.1284046692607004 37.84 -122.25 2.992 3.12 52.0 4.797527047913447 1.061823802163833 1157.0 1.7882534775888717 37.84 -122.25 2.414 3.2031 52.0 5.477611940298507 1.0796019900497513 910.0 2.263681592039801 37.85 -122.26 2.815 3.2705 52.0 4.772479564032698 1.0245231607629428 1504.0 2.0490463215258856 37.85 -122.26 2.418 3.6912 52.0 4.970588235294118 0.9901960784313726 1551.0 2.172268907563025 37.84 -122.25 2.611 3.8462 52.0 6.281853281853282 1.0810810810810811 565.0 2.1814671814671813 37.85 -122.25 3.422 8.3014 21.0 6.238137082601054 0.9718804920913884 2401.0 2.109841827768014 37.86 -122.22 3.585 8.3252 41.0 6.984126984126984 1.0238095238095237 322.0 2.5555555555555554 37.88 -122.23 4.526
- Create test dataset from sample 2 by filtering on "sampleid" and drop "sampleid" column as it is not required for scoring.
housing_test = housing_sample[housing_sample.sampleid == "2"].drop("sampleid", axis = 1)
housing_test
The output:MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude Longitude MedHouseVal 2.6736 52.0 4.0 1.0977011494252873 345.0 1.9827586206896552 37.84 -122.26 1.913 2.6 52.0 5.270142180094787 1.0355450236966826 1006.0 2.3838862559241707 37.84 -122.27 1.326 1.7969 48.0 5.737313432835821 1.2208955223880598 1026.0 3.062686567164179 37.84 -122.27 1.104 1.375 49.0 5.0303951367781155 1.1124620060790273 754.0 2.291793313069909 37.83 -122.27 1.049 1.4861 49.0 4.6022727272727275 1.0681818181818181 570.0 2.159090909090909 37.83 -122.27 0.972 1.0972 48.0 4.807486631016043 1.1550802139037433 987.0 2.6390374331550803 37.83 -122.27 1.045 2.7303 51.0 4.972014925373134 1.0708955223880596 1258.0 2.3470149253731343 37.83 -122.27 1.097 1.7135 42.0 4.478142076502732 1.0027322404371584 929.0 2.5382513661202184 37.85 -122.27 1.598 3.12 52.0 4.797527047913447 1.061823802163833 1157.0 1.7882534775888717 37.84 -122.25 2.414 4.0368 52.0 4.761658031088083 1.1036269430051813 413.0 2.139896373056995 37.85 -122.25 2.697
- Access California Housing dataset.
- Create TensorFlow SageMaker instance through tdapiclient.
exec_role_arn = "arn:aws:iam::076782961461:role/service-role/AmazonSageMaker-ExecutionRole-20210112T215668" TensorFlow = td_apiclient.TensorFlow( entry_point="script.py", role=exec_role_arn, instance_count=1, instance_type="ml.m5.xlarge", framework_version="2.2.0", py_version="py37", )
- Start training TensorFlow using teradataml DataFrame objects.
TensorFlow.fit({'train': housing_train,'validation':housing_test})
- Create Serializer and Deserializer, so predictor can handle CSV input and output.
from sagemaker.serializers import CSVSerializer from sagemaker.deserializers import CSVDeserializer csv_ser = CSVSerializer() csv_dser = CSVDeserializer()
predictor = TensorFlow.deploy("aws-endpoint", sagemaker_kw_args={"instance_type": "ml.m5.large", "initial_instance_count": 1, "serializer": csv_ser, "deserializer": csv_dser})
- Try prediction integration using teradataml DataFrame and the predictor object created in previous step.
- Confirm that predictor is correctly configured for accepting csv input.
print(predictor.cloudObj.accept)
The output:('text/csv',)
- Prepare test dataset.
test=housing_test.drop("MedHouseVal",axis=1)
item1=test.head() item2=test.tail()
- Try prediction with UDF and Client options.Prediction with UDF option:
output = predictor.predict(item1, mode="UDF",content_type='csv')
output
The output:MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude Longitude Output 0.536 16.0 2.111111111111111 2.111111111111111 166.0 18.444444444444443 37.67 -121.04 { "predictions": [[-1.15928912] ] } 0.6825 17.0 2.372549019607843 0.9901960784313726 198.0 0.9705882352941176 37.65 -121.0 { "predictions": [[1.20473754] ] } 0.7025 19.0 2.425196850393701 1.125984251968504 1799.0 2.8330708661417323 35.3 -120.67 { "predictions": [[1.5769695] ] } 0.7235 19.0 2.839622641509434 0.9551886792452831 844.0 1.990566037735849 34.1 -117.29 { "predictions": [[1.33129275] ] } 0.75 52.0 2.823529411764706 0.9117647058823529 191.0 5.617647058823529 37.8 -122.28 { "predictions": [[0.700169265] ] } 0.7714 16.0 2.698581560283688 1.0851063829787233 438.0 1.553191489361702 37.95 -121.29 { "predictions": [[1.20444167] ] } 0.7473 22.0 3.116650987770461 1.1618062088428975 2381.0 2.239887111947319 34.03 -118.29 { "predictions": [[1.86170757] ] } 0.536 4.0 14.0 3.3333333333333335 9.0 3.0 34.14 -116.76 { "predictions": [[0.37429595] ] } 0.536 26.0 7.846153846153846 1.3076923076923077 43.0 3.3076923076923075 38.7 -122.52 { "predictions": [[0.582466722] ] } 0.4999 16.0 21.63157894736842 6.0 26.0 1.368421052631579 39.42 -122.89 { "predictions": [[0.406623662] ] }
Prediction with Client option:output = predictor.predict(item2, mode="client",content_type='csv')
output
The output:[['{'], [' "predictions": [[7.58765793]', ' [6.9203043]', ' [7.83686399]', ' [7.64691257]', ' [7.59357548]', ' [7.44582224]', ' [7.47468281]', ' [7.68216419]', ' [7.43133402]', ' [7.75137663]'], [' ]'], ['}']]
- Confirm that predictor is correctly configured for accepting csv input.
- Clean up.
predictor.cloudObj.delete_model() predictor.cloudObj.delete_endpoint() remove_tdapi_context(tdapi_context)