Using SageMaker RandomCutForest with tdapiclient | API Integration - Using SageMaker RandomCutForest with tdapiclient - Teradata Vantage

Teradata Vantageā„¢ - API Integration Guide for Cloud Machine Learning

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Vantage
Release Number
1.4
Published
September 2023
ft:locale
en-US
ft:lastEdition
2023-09-28
dita:mapPath
mgu1643999543506.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
mgu1643999543506

This use case shows the steps to use SageMaker RandomCutForest Estimator with tdapiclient.

You can download the aws-usecases.zip file in the attachment as a reference. The random-cut-forest folder in the zip file includes a Jupyter notebook file for this use case..

  1. Import necessary libraries.
    import getpass
    import sagemaker
    from tdapiclient import create_tdapi_context,TDApiClient
    from teradataml import create_context, DataFrame, copy_to_sql,load_example_data, configure, LabelEncoder, valib,Retain
    import pandas as pd
    import numpy as np
    from sklearn.preprocessing import LabelEncoder
    from teradatasqlalchemy.types import *
  2. Create the connection.
    host = input("Host: ")
    username = input("Username: ")
    password = getpass.getpass("Password: ")
    td_context = create_context(host=host, username=username, password=password)
  3. Create TDAPI context and TDApiClient object.
    s3_bucket = input("S3 Bucket(Please provide just the bucket name, for example: test-bucket): ")
    access_id = input("Access ID:")
    access_key = getpass.getpass("Acess Key: ")
    region = input("AWS Region: ")
    os.environ["AWS_ACCESS_KEY_ID"] = access_id
    os.environ["AWS_SECRET_ACCESS_KEY"] = access_key
    os.environ["AWS_REGION"] = region
    tdapi_context = create_tdapi_context("aws", bucket_name=s3_bucket)
    td_apiclient = TDApiClient(tdapi_context)
  4. Set up data.
    1. Load the example data.
      load_example_data("seriessplitter", "ibm_stock")
      data = DataFrame("ibm_stock")
    2. Drop unnecessary columns.
      data=data.drop(['period'],axis=1)
    3. Encode the column 'name' using label encoder.
      from teradataml import LabelEncoder 
      rc = LabelEncoder(values=("ibm", 1), columns=["name"])
      feature_columns_names= Retain(columns=["stockprice"])
      configure.val_install_location = "alice"
      data = valib.Transform(data=data, label_encode=rc,index_columns="id",unique_index=True,retain=feature_columns_names)
      data=data.result
    4. Drop unnecessary columns.
      data=data.drop("id",axis=1)
    5. Create two samples of input data: sample 1 has 80% of total rows and sample 2 has 20% of total rows.
      data_sample = data.sample(frac=[0.8, 0.2])
      data_sample
      The output:
      stockprice	name	sampleid
      552	1	2
      352	1	1
      556	1	1
      370	1	1
      475	1	1
      385	1	1
      557	1	2
      350	1	1
      596	1	2
      474	1	1
      
    6. Create train dataset from sample 1 by filtering on "sampleid" and drop "sampleid" column as it is not required for training model.
      data_train = data_sample[data_sample.sampleid == "1"].drop("sampleid", axis = 1)
      data_train
      The output:
      stockprice	name
      475	1
      557	1
      577	1
      497	1
      552	1
      556	1
      521	1
      487	1
      387	1
      385	1
      
    7. Create test dataset from sample 2 by filtering on "sampleid" and drop "sampleid" column as it is not required for scoring.
      data_test = data_sample[data_sample.sampleid == "2"].drop("sampleid", axis = 1)
      data_test
      The output:
      stockprice	name
      555	1
      496	1
      409	1
      551	1
      491	1
      587	1
      399	1
      542	1
      531	1
      387	1
      
  5. Create RandomCutForest instance through tdapiclient.
    exec_role_arn = "arn:aws:iam::076782961461:role/service-role/AmazonSageMaker-ExecutionRole-20210112T215668"
    rcf = td_apiclient.RandomCutForest(role=exec_role_arn,
                                        instance_count=1,
                                        instance_type="ml.m5.large",
                                        num_samples_per_tree=512,
                                        num_trees=50)
  6. Convert train data to RecordSet object to be passed to fit method.
    train_set=rcf.record_set(data_train.get_values().astype('float32'))
  7. Start training using RecordSet objects.
    rcf.fit(train_set)
  8. Create Serializer and Deserializer, so predictor can handle CSV input and output.
    from sagemaker.serializers import CSVSerializer
    from sagemaker.deserializers import CSVDeserializer
    csv_ser = CSVSerializer()
    csv_dser = CSVDeserializer()
    predictor = rcf.deploy("aws-endpoint",
                           sagemaker_kw_args={"instance_type": "ml.m5.large", "initial_instance_count": 1, "serializer": csv_ser, "deserializer": csv_dser})
  9. Try prediction integration using teradataml DataFrame and the predictor object created in previous step.
    1. Confirm that predictor is correctly configured for accepting csv input.
      print(predictor.cloudObj.accept)
      The output:
      ('text/csv',)
    2. Try prediction with UDF and Client options.
      Prediction with Client option:
      output = predictor.predict(data_test, mode="client",content_type='csv')
      output
      The output:
      [['0.7504941002883164'],
       ['1.0207224796626084'],
       ['1.0308394635056672'],
       ['0.9211458191353374'],
       ['0.7386602728840181'],
       ['0.9023601909268845'],
       ['0.7716498762848135'],
       ['0.9157308017573146'],
       ['1.1074548624933822'],
       ['0.7695934123198193'],
       ['0.7240618827098061'],
       ['0.9301838696836991'],
       ['0.7543856434161177'],
       ['0.7758886919821008'],
       ['0.8995296519196438'],
       ['0.7900451450353364'],
       ['0.7543856434161177'],
       ['0.9845105265173143'],
       ['0.9041053286804649'],
       ['0.7517221420434166'],
       ['0.7543856434161177'],
       ['0.7755021272225036'],
       ['0.7312255026665386'],
       ['0.7846481495495119'],
       ['1.323345181866896'],
       ['0.7574469092696446'],
       ['0.9172574024288217'],
       ['0.8865849688703292'],
       ['0.9558031346431431'],
       ['1.1401907300479466'],
       ['0.7783665988095414'],
       ['0.8606176638947314'],
       ['0.8194513516498502'],
       ['1.0037295269063529'],
       ['0.7756173809067299'],
       ['0.7574469092696446'],
       ['0.8474858343051359'],
       ['0.8069028137727154'],
       ['1.1945449394990506'],
       ['0.7574469092696446'],
       ['0.7200622617886648'],
       ['0.907478795887426'],
       ['0.7200622617886648'],
       ['0.7760705068507692'],
       ['0.724224159464554'],
       ['0.9545242913829999'],
       ['0.9420880565470928'],
       ['0.9065678443198703'],
       ['0.7157960453933286'],
       ['0.960482824608952'],
       ['0.9318529852186351'],
       ['0.9023601909268845'],
       ['0.9630174703884888'],
       ['0.7517221420434166'],
       ['0.8606176638947314'],
       ['0.7517221420434166'],
       ['0.7823662290441952'],
       ['0.7376855731824589'],
       ['1.0636259692671353'],
       ['0.7480312078032537'],
       ['0.81220691722653'],
       ['0.8493853404084787'],
       ['0.8401529182505993'],
       ['0.7242812649681286'],
       ['0.7823662290441952'],
       ['1.0037295269063529'],
       ['0.9380221499147791'],
       ['0.8468019836106951'],
       ['0.7504941002883164'],
       ['0.8606176638947314'],
       ['0.8957845289058336'],
       ['0.9041053286804649'],
       ['0.9420880565470928'],
       ['0.7787192480469368']]
      item=data_test.tail()
      Prediction with UDF option:
      output = predictor.predict(item, mode="UDF",content_type='csv')
      output
      The output:
      stockprice	name	Output
      590	1	{"scores":[{"score":0.9964993369}]}
      585	1	{"scores":[{"score":0.9545242914}]}
      584	1	{"scores":[{"score":0.9420421609}]}
      583	1	{"scores":[{"score":0.953565364}]}
      581	1	{"scores":[{"score":0.9211458191}]}
      578	1	{"scores":[{"score":0.9041053287}]}
      581	1	{"scores":[{"score":0.9211458191}]}
      588	1	{"scores":[{"score":0.9759149282}]}
      592	1	{"scores":[{"score":1.0207224797}]}
      596	1	{"scores":[{"score":1.0538021653}]}
      
  10. Clean up.
    predictor.cloudObj.delete_model()
    predictor.cloudObj.delete_endpoint()
    remove_tdapi_context(tdapi_context)