This use case shows the steps to use SageMaker Chainer Estimator with tdapiclient.
You can download the aws-usecases.zip file in the attachment as a reference. The chainer folder in the zip file includes a Jupyter notebook file (ipynb) and a Python file (py) required to run this notebook file.
- Import necessary packages.
import getpass from tdapiclient import create_tdapi_context, TDApiClient from teradataml import create_context, DataFrame import pandas as pd from teradatasqlalchemy.types import * from teradataml import DataFrame, load_example_data, create_context
- Create the connection.
host = input("Host: ") username = input("Username: ") password = getpass.getpass("Password: ")
td_context = create_context(host=host, username=username, password=password)
- Create TDAPI context and TDApiClient object.
s3_bucket = input("S3 Bucket(Please provide just the bucket name, for example: test-bucket): ") access_id = input("Access ID:") access_key = getpass.getpass("Acess Key: ") region = input("AWS Region: ")
os.environ["AWS_ACCESS_KEY_ID"] = access_id os.environ["AWS_SECRET_ACCESS_KEY"] = access_key os.environ["AWS_REGION"] = region
tdapi_context = create_tdapi_context("aws", bucket_name=s3_bucket)
td_apiclient = TDApiClient(tdapi_context)
- Set up data.
- Set feature and target.
feature = ["sepal_length","sepal_width","petal_length","petal_width"] target = "species"
- Load the data to run the example.
load_example_data("byom", "iris_input")
- Create teradataml DataFrame.
iris = DataFrame.from_table("iris_input")
- Check the teradataml DataFrame.
iris.info()
<class 'teradataml.dataframe.dataframe.DataFrame'> Data columns (total 6 columns): id int sepal_length float sepal_width float petal_length float petal_width float species int dtypes: float(4), int(2)
- Create two samples of input data: sample 1 has 80% of total rows and sample 2 has 20% of total rows.
iris_sample = iris.sample(frac=[0.8, 0.2])
iris_sample
The output:id sepal_length sepal_width petal_length petal_width species sampleid 141 6.7 3.1 5.6 2.4 3 1 99 5.1 2.5 3.0 1.1 2 1 17 5.4 3.9 1.3 0.4 1 1 61 5.0 2.0 3.5 1.0 2 1 19 5.7 3.8 1.7 0.3 1 1 80 5.7 2.6 3.5 1.0 2 2 59 6.6 2.9 4.6 1.3 2 2 38 4.9 3.6 1.4 0.1 1 1 40 5.1 3.4 1.5 0.2 1 2 120 6.0 2.2 5.0 1.5 3 1
- Create train dataset from sample 1 by filtering on "sampleid" and drop "sampleid" column as it is not required for training model.
train_df = iris_sample[iris_sample.sampleid == "1"].drop(["id","sampleid"], axis = 1)
train_df
The output:sepal_length sepal_width petal_length petal_width species 6.7 3.1 5.6 2.4 3 5.1 2.5 3.0 1.1 2 6.0 3.0 4.8 1.8 3 5.0 2.0 3.5 1.0 2 5.7 3.8 1.7 0.3 1 5.7 2.6 3.5 1.0 2 4.9 3.6 1.4 0.1 1 6.7 3.0 5.0 1.7 2 5.1 3.4 1.5 0.2 1 6.0 2.2 5.0 1.5 3
- Create test dataset from sample 2 by filtering on "sampleid" and drop "sampleid" column as it is not required for scoring.
test_df = iris_sample[iris_sample.sampleid == "2"].drop(["id","sampleid"], axis = 1)
test_df
The output:sepal_length sepal_width petal_length petal_width species 5.6 3.0 4.1 1.3 2 5.8 2.7 5.1 1.9 3 5.5 2.4 3.8 1.1 2 6.6 2.9 4.6 1.3 2 5.7 2.5 5.0 2.0 3 7.2 3.6 6.1 2.5 3 7.3 2.9 6.3 1.8 3 5.0 3.0 1.6 0.2 1 6.4 3.2 5.3 2.3 3 5.1 3.8 1.6 0.2 1
- Set feature and target.
- Set bucket locations.
# Bucket location where your custom code will be saved in the tar.gz format. custom_code_upload_location = "s3://{}/Chainer/code".format(s3_bucket)
# Bucket location where results of model training are saved. model_artifacts_location = "s3://{}/Chainer/artifacts".format(s3_bucket)
- Create Chainer SageMaker estimator instance through tdapiclient.
exec_role_arn = "arn:aws:iam::076782961461:role/service-role/AmazonSageMaker-ExecutionRole-20210112T215668" FRAMEWORK_VERSION = "4.1.0"
# Create an estimator object based on Chainer sagemaker class Chainer_estimator = td_apiclient.Chainer( entry_point="chainer-script.py", role=exec_role_arn, instance_count=1, instance_type="ml.m5.large", output_path=model_artifacts_location, code_location=custom_code_upload_location, py_version='py3', framework_version=FRAMEWORK_VERSION, metric_definitions=[{"Name": "median-AE", "Regex": "AE-at-50th-percentile: ([0-9.]+).*$"}], hyperparameters={ "epochs": 50, "batch_size": 12, "features": feature, "target": target, "units" : 4, }, )
- Start training using train and test DataFrame.
- Show the train DataFrame.
train_df
The output:sepal_length sepal_width petal_length petal_width species 5.7 2.6 3.5 1.0 2 4.9 3.6 1.4 0.1 1 6.7 3.0 5.0 1.7 2 5.6 2.8 4.9 2.0 3 6.7 3.1 5.6 2.4 3 6.0 2.2 5.0 1.5 3 5.1 2.5 3.0 1.1 2 5.4 3.9 1.3 0.4 1 6.3 3.3 6.0 2.5 3 6.6 2.9 4.6 1.3 2
- Show the test DataFrame.
test_df
The output:sepal_length sepal_width petal_length petal_width species 7.0 3.2 4.7 1.4 2 6.7 3.1 4.7 1.5 2 5.1 3.8 1.9 0.4 1 5.7 3.8 1.7 0.3 1 6.1 2.8 4.7 1.2 2 5.7 2.5 5.0 2.0 3 6.4 2.7 5.3 1.9 3 5.9 3.0 5.1 1.8 3 6.6 3.0 4.4 1.4 2 5.6 2.5 3.9 1.1 2
- Start training using DataFrame objects.
Chainer_estimator.fit({"train": train_df, "test": test_df}, content_type="csv", wait=True)
- Show the train DataFrame.
- Create Serializer and Deserializer, so predictor can handle CSV input and output.
from sagemaker.serializers import CSVSerializer from sagemaker.deserializers import CSVDeserializer csv_ser = CSVSerializer() csv_dser = CSVDeserializer()
predictor = Chainer_estimator.deploy("aws-endpoint", sagemaker_kw_args={"instance_type": "ml.m5.large", "initial_instance_count": 1, "serializer": csv_ser, "deserializer": csv_dser})
- Score the model using teradataml DataFrame and the predictor object created in previous step.
- Try the predictor with simple CSV data to see if it works as expected.
item = '5.0,3.0,1.6,0.2'
print(predictor.cloudObj.accept) print(predictor.cloudObj.predict(item))
The output:('text/csv',) [['1']]
- Try prediction with UDF and Client options.Input:
input = test_df.sample(n=5).select(feature)
input
The output:sepal_length sepal_width petal_length petal_width 6.3 2.5 5.0 1.9 5.3 3.7 1.5 0.2 5.1 2.5 3.0 1.1 5.5 3.5 1.3 0.2 4.6 3.4 1.4 0.3
Prediction with UDF option:output = predictor.predict(input, mode="UDF",content_type='csv')
output
The output:sepal_length sepal_width petal_length petal_width Output 4.6 3.1 1.5 0.2 1 5.7 4.4 1.5 0.4 1 6.3 2.9 5.6 1.8 3 5.0 2.3 3.3 1.0 2 5.0 2.0 3.5 1.0 2
Prediction with Client option:output = predictor.predict(input, mode="Client", content_type='csv')
output
The output:[['1', '1', '3', '3', '1']]
- Try the predictor with simple CSV data to see if it works as expected.
- Clean up.
predictor.cloudObj.delete_model() predictor.cloudObj.delete_endpoint() remove_tdapi_context(tdapi_context)