Using SageMaker Chainer Estimator with tdapiclient | API Integration - Using SageMaker Chainer Estimator with tdapiclient - Teradata Vantage

Teradata Vantageā„¢ - API Integration Guide for Cloud Machine Learning

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Vantage
Release Number
1.4
Published
September 2023
Language
English (United States)
Last Update
2023-09-28
dita:mapPath
mgu1643999543506.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
mgu1643999543506

This use case shows the steps to use SageMaker Chainer Estimator with tdapiclient.

You can download the aws-usecases.zip file in the attachment as a reference. The chainer folder in the zip file includes a Jupyter notebook file (ipynb) and a Python file (py) required to run this notebook file.

  1. Import necessary packages.
    import getpass
    from tdapiclient import create_tdapi_context, TDApiClient
    from teradataml import create_context, DataFrame
    import pandas as pd
    from teradatasqlalchemy.types import  *
    from teradataml import  DataFrame, load_example_data, create_context
  2. Create the connection.
    host = input("Host: ")
    username = input("Username: ")
    password = getpass.getpass("Password: ")
    td_context = create_context(host=host, username=username, password=password)
  3. Create TDAPI context and TDApiClient object.
    s3_bucket = input("S3 Bucket(Please provide just the bucket name, for example: test-bucket): ")
    access_id = input("Access ID:")
    access_key = getpass.getpass("Acess Key: ")
    region = input("AWS Region: ")
    os.environ["AWS_ACCESS_KEY_ID"] = access_id
    os.environ["AWS_SECRET_ACCESS_KEY"] = access_key
    os.environ["AWS_REGION"] = region
    tdapi_context = create_tdapi_context("aws", bucket_name=s3_bucket)
    td_apiclient = TDApiClient(tdapi_context)
  4. Set up data.
    1. Set feature and target.
      feature = ["sepal_length","sepal_width","petal_length","petal_width"]
      target = "species"
    2. Load the data to run the example.
      load_example_data("byom", "iris_input")
    3. Create teradataml DataFrame.
      iris = DataFrame.from_table("iris_input")
    4. Check the teradataml DataFrame.
      iris.info()
      <class 'teradataml.dataframe.dataframe.DataFrame'>
      Data columns (total 6 columns):
      id                int
      sepal_length    float
      sepal_width     float
      petal_length    float
      petal_width     float
      species           int
      dtypes: float(4), int(2)
    5. Create two samples of input data: sample 1 has 80% of total rows and sample 2 has 20% of total rows.
      iris_sample = iris.sample(frac=[0.8, 0.2])
      iris_sample
      The output:
      id	sepal_length	sepal_width	petal_length	petal_width	species	sampleid
      141	6.7	3.1	5.6	2.4	3	1
      99	5.1	2.5	3.0	1.1	2	1
      17	5.4	3.9	1.3	0.4	1	1
      61	5.0	2.0	3.5	1.0	2	1
      19	5.7	3.8	1.7	0.3	1	1
      80	5.7	2.6	3.5	1.0	2	2
      59	6.6	2.9	4.6	1.3	2	2
      38	4.9	3.6	1.4	0.1	1	1
      40	5.1	3.4	1.5	0.2	1	2
      120	6.0	2.2	5.0	1.5	3	1
      
    6. Create train dataset from sample 1 by filtering on "sampleid" and drop "sampleid" column as it is not required for training model.
      train_df = iris_sample[iris_sample.sampleid == "1"].drop(["id","sampleid"], axis = 1)
      train_df
      The output:
      sepal_length	sepal_width	petal_length	petal_width	species
      6.7	3.1	5.6	2.4	3
      5.1	2.5	3.0	1.1	2
      6.0	3.0	4.8	1.8	3
      5.0	2.0	3.5	1.0	2
      5.7	3.8	1.7	0.3	1
      5.7	2.6	3.5	1.0	2
      4.9	3.6	1.4	0.1	1
      6.7	3.0	5.0	1.7	2
      5.1	3.4	1.5	0.2	1
      6.0	2.2	5.0	1.5	3
      
    7. Create test dataset from sample 2 by filtering on "sampleid" and drop "sampleid" column as it is not required for scoring.
      test_df = iris_sample[iris_sample.sampleid == "2"].drop(["id","sampleid"], axis = 1)
      test_df
      The output:
      sepal_length	sepal_width	petal_length	petal_width	species
      5.6	3.0	4.1	1.3	2
      5.8	2.7	5.1	1.9	3
      5.5	2.4	3.8	1.1	2
      6.6	2.9	4.6	1.3	2
      5.7	2.5	5.0	2.0	3
      7.2	3.6	6.1	2.5	3
      7.3	2.9	6.3	1.8	3
      5.0	3.0	1.6	0.2	1
      6.4	3.2	5.3	2.3	3
      5.1	3.8	1.6	0.2	1
      
  5. Set bucket locations.
    # Bucket location where your custom code will be saved in the tar.gz format.
    custom_code_upload_location = "s3://{}/Chainer/code".format(s3_bucket)
    # Bucket location where results of model training are saved.
    model_artifacts_location = "s3://{}/Chainer/artifacts".format(s3_bucket)
  6. Create Chainer SageMaker estimator instance through tdapiclient.
    exec_role_arn = "arn:aws:iam::076782961461:role/service-role/AmazonSageMaker-ExecutionRole-20210112T215668"
    FRAMEWORK_VERSION = "4.1.0"
    # Create an estimator object based on Chainer sagemaker class
    Chainer_estimator = td_apiclient.Chainer(
        entry_point="chainer-script.py",
        role=exec_role_arn,
        instance_count=1,
        instance_type="ml.m5.large",
        output_path=model_artifacts_location,
        code_location=custom_code_upload_location,
        py_version='py3',
        framework_version=FRAMEWORK_VERSION,
        metric_definitions=[{"Name": "median-AE", "Regex": "AE-at-50th-percentile: ([0-9.]+).*$"}],
        hyperparameters={
            "epochs": 50,
            "batch_size": 12,
            "features": feature,
            "target": target,
            "units" : 4,
        },
    )
  7. Start training using train and test DataFrame.
    1. Show the train DataFrame.
      train_df
      The output:
      sepal_length	sepal_width	petal_length	petal_width	species
      5.7	2.6	3.5	1.0	2
      4.9	3.6	1.4	0.1	1
      6.7	3.0	5.0	1.7	2
      5.6	2.8	4.9	2.0	3
      6.7	3.1	5.6	2.4	3
      6.0	2.2	5.0	1.5	3
      5.1	2.5	3.0	1.1	2
      5.4	3.9	1.3	0.4	1
      6.3	3.3	6.0	2.5	3
      6.6	2.9	4.6	1.3	2
      
    2. Show the test DataFrame.
      test_df
      The output:
      sepal_length	sepal_width	petal_length	petal_width	species
      7.0	3.2	4.7	1.4	2
      6.7	3.1	4.7	1.5	2
      5.1	3.8	1.9	0.4	1
      5.7	3.8	1.7	0.3	1
      6.1	2.8	4.7	1.2	2
      5.7	2.5	5.0	2.0	3
      6.4	2.7	5.3	1.9	3
      5.9	3.0	5.1	1.8	3
      6.6	3.0	4.4	1.4	2
      5.6	2.5	3.9	1.1	2
      
    3. Start training using DataFrame objects.
      Chainer_estimator.fit({"train": train_df, "test": test_df}, content_type="csv", wait=True)
  8. Create Serializer and Deserializer, so predictor can handle CSV input and output.
    from sagemaker.serializers import CSVSerializer
    from sagemaker.deserializers import CSVDeserializer
    csv_ser = CSVSerializer()
    csv_dser = CSVDeserializer()
    predictor = Chainer_estimator.deploy("aws-endpoint",
                                         sagemaker_kw_args={"instance_type": "ml.m5.large", "initial_instance_count": 1, "serializer": csv_ser, "deserializer": csv_dser})
  9. Score the model using teradataml DataFrame and the predictor object created in previous step.
    1. Try the predictor with simple CSV data to see if it works as expected.
      item = '5.0,3.0,1.6,0.2'
      print(predictor.cloudObj.accept)
      print(predictor.cloudObj.predict(item))
      The output:
      ('text/csv',)
      [['1']]
    2. Try prediction with UDF and Client options.
      Input:
      input = test_df.sample(n=5).select(feature)
      input
      The output:
      sepal_length	sepal_width	petal_length	petal_width
      6.3	2.5	5.0	1.9
      5.3	3.7	1.5	0.2
      5.1	2.5	3.0	1.1
      5.5	3.5	1.3	0.2
      4.6	3.4	1.4	0.3
      
      Prediction with UDF option:
      output = predictor.predict(input, mode="UDF",content_type='csv')
      output
      The output:
      sepal_length	sepal_width	petal_length	petal_width	Output
      4.6	3.1	1.5	0.2	1
      5.7	4.4	1.5	0.4	1
      6.3	2.9	5.6	1.8	3
      5.0	2.3	3.3	1.0	2
      5.0	2.0	3.5	1.0	2
      
      Prediction with Client option:
      output = predictor.predict(input, mode="Client", content_type='csv')
      output
      The output:
      [['1', '1', '3', '3', '1']]
  10. Clean up.
    predictor.cloudObj.delete_model()
    predictor.cloudObj.delete_endpoint()
    remove_tdapi_context(tdapi_context)