Using SageMaker TensorFlow Estimator with tdapiclient | API Integration - Using SageMaker TensorFlow Estimator with tdapiclient - Teradata Vantage

Teradata Vantageā„¢ - API Integration Guide for Cloud Machine Learning

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Vantage
Release Number
1.4
Published
September 2023
Language
English (United States)
Last Update
2023-09-28
dita:mapPath
mgu1643999543506.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
mgu1643999543506

This use case shows the steps to use SageMaker TensorFlow Estimator with tdapiclient.

You can download the aws-usecases.zip file in the attachment as a reference. The tensorflow folder in the zip file includes a Jupyter notebook file (ipynb) and a Python file (py) required to run this notebook file.

  1. Import required libraries.
    import getpass
    from tdapiclient import create_tdapi_context,TDApiClient
    from teradataml import create_context, DataFrame, copy_to_sql
    import pandas as pd
    from teradatasqlalchemy.types import *
  2. Create the connection.
    host = input("Host: ")
    username = input("Username: ")
    password = getpass.getpass("Password: ")
    td_context = create_context(host=host, username=username, password=password)
  3. Create TDAPI context and TDApiClient object.
    s3_bucket = input("S3 Bucket(Please provide just the bucket name, for example: test-bucket): ")
    access_id = input("Access ID:")
    access_key = getpass.getpass("Acess Key: ")
    region = input("AWS Region: ")
    os.environ["AWS_ACCESS_KEY_ID"] = access_id
    os.environ["AWS_SECRET_ACCESS_KEY"] = access_key
    os.environ["AWS_REGION"] = region
    tdapi_context = create_tdapi_context("aws", bucket_name=s3_bucket)
    td_apiclient = TDApiClient(tdapi_context)
  4. Set up data.
    1. Access California Housing dataset.
      from sklearn.model_selection import train_test_split
      from sklearn.datasets import fetch_california_housing
      california_housing = fetch_california_housing(as_frame=True)
      data=california_housing.frame
    2. Insert the DataFrame in the tables.
      data_table = "housing_data"
      
      column_types = {"MedInc": FLOAT, "HouseAge": FLOAT,
                      "AveRooms": FLOAT, "AveBedrms": FLOAT, "Population": FLOAT,
                      "AveOccup": FLOAT, "Latitude": FLOAT, "Longitude": FLOAT,
                      "MedHouseVal" : FLOAT}
      copy_to_sql(df=data, table_name=data_table, if_exists="replace", types=column_types)
      data = DataFrame(table_name=data_table)
    3. Create two samples of input data: sample 1 has 80% of total rows and sample 2 has 20% of total rows.
      housing_sample = data.sample(frac=[0.8, 0.2])
    4. Create train dataset from sample 1 by filtering on "sampleid" and drop "sampleid" column as it is not required for training model.
      housing_train = housing_sample[housing_sample.sampleid == "1"].drop("sampleid", axis = 1)
      housing_train
      The output:
      MedInc	HouseAge	AveRooms	AveBedrms	  Population	AveOccup	Latitude	Longitude	MedHouseVal
      7.2574	52.0	8.288135593220339	1.073446327683616	496.0	2.8022598870056497	37.85	-122.24	3.521
      4.0368	52.0	4.761658031088083	1.1036269430051813	413.0	2.139896373056995	37.85	-122.25	2.697
      3.6591	52.0	4.9319066147859925	0.9513618677042801	1094.0	2.1284046692607004	37.84	-122.25	2.992
      3.12	52.0	4.797527047913447	1.061823802163833	1157.0	1.7882534775888717	37.84	-122.25	2.414
      3.2031	52.0	5.477611940298507	1.0796019900497513	910.0	2.263681592039801	37.85	-122.26	2.815
      3.2705	52.0	4.772479564032698	1.0245231607629428	1504.0	2.0490463215258856	37.85	-122.26	2.418
      3.6912	52.0	4.970588235294118	0.9901960784313726	1551.0	2.172268907563025	37.84	-122.25	2.611
      3.8462	52.0	6.281853281853282	1.0810810810810811	565.0	2.1814671814671813	37.85	-122.25	3.422
      8.3014	21.0	6.238137082601054	0.9718804920913884	2401.0	2.109841827768014	37.86	-122.22	3.585
      8.3252	41.0	6.984126984126984	1.0238095238095237	322.0	2.5555555555555554	37.88	-122.23	4.526
    5. Create test dataset from sample 2 by filtering on "sampleid" and drop "sampleid" column as it is not required for scoring.
      housing_test = housing_sample[housing_sample.sampleid == "2"].drop("sampleid", axis = 1)
      housing_test
      The output:
      MedInc	HouseAge	AveRooms	AveBedrms	Population	AveOccup	Latitude	Longitude	MedHouseVal
      2.6736	52.0	           4.0	1.0977011494252873	345.0	1.9827586206896552	37.84	-122.26	1.913
      2.6	52.0	5.270142180094787	1.0355450236966826	1006.0	2.3838862559241707	37.84	-122.27	1.326
      1.7969	48.0	5.737313432835821	1.2208955223880598	1026.0	3.062686567164179	37.84	-122.27	1.104
      1.375	49.0	5.0303951367781155	1.1124620060790273	754.0	2.291793313069909	37.83	-122.27	1.049
      1.4861	49.0	4.6022727272727275	1.0681818181818181	570.0	2.159090909090909	37.83	-122.27	0.972
      1.0972	48.0	4.807486631016043	1.1550802139037433	987.0	2.6390374331550803	37.83	-122.27	1.045
      2.7303	51.0	4.972014925373134	1.0708955223880596	1258.0	2.3470149253731343	37.83	-122.27	1.097
      1.7135	42.0	4.478142076502732	1.0027322404371584	929.0	2.5382513661202184	37.85	-122.27	1.598
      3.12	52.0	4.797527047913447	1.061823802163833	1157.0	1.7882534775888717	37.84	-122.25	2.414
      4.0368	52.0	4.761658031088083	1.1036269430051813	413.0	2.139896373056995	37.85	-122.25	2.697
  5. Create TensorFlow SageMaker instance through tdapiclient.
    exec_role_arn = "arn:aws:iam::076782961461:role/service-role/AmazonSageMaker-ExecutionRole-20210112T215668"
    
    TensorFlow = td_apiclient.TensorFlow(
        entry_point="script.py",
        role=exec_role_arn,
        instance_count=1,
        instance_type="ml.m5.xlarge",
        framework_version="2.2.0",
        py_version="py37",
    )
  6. Start training TensorFlow using teradataml DataFrame objects.
    TensorFlow.fit({'train': housing_train,'validation':housing_test}) 
  7. Create Serializer and Deserializer, so predictor can handle CSV input and output.
    from sagemaker.serializers import CSVSerializer
    from sagemaker.deserializers import CSVDeserializer
    csv_ser = CSVSerializer()
    csv_dser = CSVDeserializer()
    predictor = TensorFlow.deploy("aws-endpoint",
                                  sagemaker_kw_args={"instance_type": "ml.m5.large", "initial_instance_count": 1, "serializer": csv_ser, "deserializer": csv_dser})
  8. Try prediction integration using teradataml DataFrame and the predictor object created in previous step.
    1. Confirm that predictor is correctly configured for accepting csv input.
      print(predictor.cloudObj.accept)
      The output:
      ('text/csv',)
    2. Prepare test dataset.
      test=housing_test.drop("MedHouseVal",axis=1)
      item1=test.head()
      item2=test.tail()
    3. Try prediction with UDF and Client options.
      Prediction with UDF option:
      output = predictor.predict(item1, mode="UDF",content_type='csv')
      output
      The output:
      MedInc	HouseAge	AveRooms	AveBedrms	Population	AveOccup	Latitude	Longitude	Output
      0.536	16.0	2.111111111111111	2.111111111111111	166.0	18.444444444444443	37.67	-121.04	{ "predictions": [[-1.15928912] ] }
      0.6825	17.0	2.372549019607843	0.9901960784313726	198.0	0.9705882352941176	37.65	-121.0	{ "predictions": [[1.20473754] ] }
      0.7025	19.0	2.425196850393701	1.125984251968504	1799.0	2.8330708661417323	35.3	-120.67	{ "predictions": [[1.5769695] ] }
      0.7235	19.0	2.839622641509434	0.9551886792452831	844.0	1.990566037735849	34.1	-117.29	{ "predictions": [[1.33129275] ] }
      0.75	52.0	2.823529411764706	0.9117647058823529	191.0	5.617647058823529	37.8	-122.28	{ "predictions": [[0.700169265] ] }
      0.7714	16.0	2.698581560283688	1.0851063829787233	438.0	1.553191489361702	37.95	-121.29	{ "predictions": [[1.20444167] ] }
      0.7473	22.0	3.116650987770461	1.1618062088428975	2381.0	2.239887111947319	34.03	-118.29	{ "predictions": [[1.86170757] ] }
      0.536	4.0	14.0	3.3333333333333335	9.0	3.0	34.14	-116.76	{ "predictions": [[0.37429595] ] }
      0.536	26.0	7.846153846153846	1.3076923076923077	43.0	3.3076923076923075	38.7	-122.52	{ "predictions": [[0.582466722] ] }
      0.4999	16.0	21.63157894736842	6.0	26.0	1.368421052631579	39.42	-122.89	{ "predictions": [[0.406623662] ] }
      
      Prediction with Client option:
      output = predictor.predict(item2, mode="client",content_type='csv')
      output
      The output:
      [['{'],
       ['    "predictions": [[7.58765793]',
        ' [6.9203043]',
        ' [7.83686399]',
        ' [7.64691257]',
        ' [7.59357548]',
        ' [7.44582224]',
        ' [7.47468281]',
        ' [7.68216419]',
        ' [7.43133402]',
        ' [7.75137663]'],
       ['    ]'],
       ['}']]
  9. Clean up.
    predictor.cloudObj.delete_model()
    predictor.cloudObj.delete_endpoint()
    remove_tdapi_context(tdapi_context)