Using SageMaker XGBoost Estimator with tdapiclient | API Integration - Using SageMaker XGBoost Estimator with tdapiclient - Teradata Vantage

Teradata Vantageā„¢ - API Integration Guide for Cloud Machine Learning

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Vantage
Release Number
1.4
Published
September 2023
ft:locale
en-US
ft:lastEdition
2023-09-28
dita:mapPath
mgu1643999543506.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
mgu1643999543506

This use case shows the steps to use SageMaker XGBoost Estimator with tdapiclient.

You can download the aws-usecases.zip file in the attachment as a reference. The xgboost folder in the zip file includes a Jupyter notebook file (ipynb), and Python file (py) and data file (csv) required to run this notebook file.

  1. Import necessary libraries.
    import getpass
    import sagemaker
    from tdapiclient import create_tdapi_context, remove_tdapi_context,TDApiClient
    from teradataml import create_context, DataFrame, copy_to_sql,load_example_data, configure, LabelEncoder, valib,Retain
    import pandas as pd
    import numpy as np
    from teradatasqlalchemy.types import *
  2. Create the connection.
    host = input("Host: ")
    username = input("Username: ")
    password = getpass.getpass("Password: ")
    td_context = create_context(host=host, username=username, password=password)
  3. Create TDAPI context and TDApiClient object.
    s3_bucket = input("S3 Bucket(Please provide just the bucket name, for example: test-bucket): ")
    access_id = input("Access ID:")
    access_key = getpass.getpass("Acess Key: ")
    region = input("AWS Region: ")
    os.environ["AWS_ACCESS_KEY_ID"] = access_id
    os.environ["AWS_SECRET_ACCESS_KEY"] = access_key
    os.environ["AWS_REGION"] = region
    tdapi_context = create_tdapi_context("aws", bucket_name=s3_bucket)
    td_apiclient = TDApiClient(tdapi_context)
  4. Set bucket locations.
    # Bucket location where your custom code will be saved in the tar.gz format.
    custom_code_upload_location = "s3://{}/xgboost/code".format(s3_bucket)
    # Bucket location where results of model training are saved.
    model_artifacts_location = "s3://{}/xgboost/artifacts".format(s3_bucket)
  5. Set up data.
    1. Read the breast cancer dataset.
      data = pd.read_csv ("cancer_data.csv")
    2. Drop unnecessary columns.
      data=data.drop(['Unnamed: 32'], axis=1)
    3. Rename columns for creating teradataml DataFrame.
      data.rename(columns={'concave points_mean':'concave_points_mean',
                                 "concave points_se":"concave_points_se",
                                 "concave points_worst":"concave_points_worst"},
                        inplace=True)
    4. Insert the dataframe in the tables.
      data_table = "cancer_data"
      column_types = {
          "id":INTEGER,
          "diagnosis": CHAR(1),
          "radius_mean": FLOAT,
          "texture_mean": FLOAT,
          "perimeter_mean": FLOAT,        
          "area_mean": FLOAT,                
          "smoothness_mean": FLOAT ,         
          "compactness_mean": FLOAT ,        
          "concavity_mean": FLOAT    ,       
          "concave_points_mean": FLOAT,   
          "symmetry_mean": FLOAT        ,    
          "fractal_dimension_mean": FLOAT,   
          "radius_se": FLOAT              ,  
          "texture_se": FLOAT              , 
          "perimeter_se": FLOAT             ,
          "area_se": FLOAT                 ,
          "smoothness_se": FLOAT           ,
          "compactness_se": FLOAT           ,
          "concavity_se": FLOAT             ,
          "concave_points_se": FLOAT       ,
          "symmetry_se": FLOAT              ,
          "fractal_dimension_se": FLOAT     ,
          "radius_worst": FLOAT             ,
          "texture_worst": FLOAT           ,
          "perimeter_worst": FLOAT          ,
          "area_worst": FLOAT               ,
          "smoothness_worst": FLOAT         ,
          "compactness_worst": FLOAT        ,
          "concavity_worst": FLOAT          ,
          "concave_points_worst": FLOAT   ,
          "symmetry_worst": FLOAT          , 
          "fractal_dimension_worst": FLOAT 
                     }
      copy_to_sql(df=data, table_name=data_table, if_exists="replace", types=column_types)
    5. Create a teradataml DataFrame using the table.
      df = DataFrame(table_name=data_table)
      df
      The output:
      id	diagnosis	radius_mean	texture_mean	perimeter_mean	area_mean	smoothness_mean	compactness_mean	concavity_mean	concave_points_mean	symmetry_mean	fractal_dimension_mean	radius_se	texture_se	perimeter_se	area_se	smoothness_se	compactness_se	concavity_se	concave_points_se	symmetry_se	fractal_dimension_se	radius_worst	texture_worst	perimeter_worst	area_worst	smoothness_worst	compactness_worst	concavity_worst	concave_points_worst	symmetry_worst	fractal_dimension_worst
      84300903	M	19.69	21.25	130.0	1203.0	0.1096	0.1599	0.1974	0.1279	0.2069	0.05999	0.7456	0.7869	4.585	94.03	0.00615	0.04006	0.03832	0.02058	0.0225	0.004571	23.57	25.53	152.5	1709.0	0.1444	0.4245	0.4504	0.243	0.3613	0.08758
      84358402	M	20.29	14.34	135.1	1297.0	0.1003	0.1328	0.198	0.1043	0.1809	0.05883	0.7572	0.7813	5.438	94.44	0.01149	0.02461	0.05688	0.01885	0.01756	0.005115	22.54	16.67	152.2	1575.0	0.1374	0.205	0.4	0.1625	0.2364	0.07678
      843786	M	12.45	15.7	82.57	477.1	0.1278	0.17	0.1578	0.08089	0.2087	0.07613	0.3345	0.8902	2.217	27.19	0.00751	0.03345	0.03672	0.01137	0.02165	0.005082	15.47	23.75	103.4	741.6	0.1791	0.5249	0.5355	0.1741	0.3985	0.1244
      844359	M	18.25	19.98	119.6	1040.0	0.09463	0.109	0.1127	0.074	0.1794	0.05742	0.4467	0.7732	3.18	53.91	0.004314	0.01382	0.02254	0.01039	0.01369	0.002179	22.88	27.66	153.2	1606.0	0.1442	0.2576	0.3784	0.1932	0.3063	0.08368
      844981	M	13.0	21.82	87.5	519.8	0.1273	0.1932	0.1859	0.09353	0.235	0.07389	0.3063	1.002	2.406	24.32	0.005731	0.03502	0.03553	0.01226	0.02143	0.003749	15.49	30.73	106.2	739.3	0.1703	0.5401	0.539	0.206	0.4378	0.1072
      84501001	M	12.46	24.04	83.97	475.9	0.1186	0.2396	0.2273	0.08543	0.203	0.08243	0.2976	1.599	2.039	23.94	0.007149	0.07217	0.07743	0.01432	0.01789	0.01008	15.09	40.68	97.65	711.4	0.1853	1.058	1.105	0.221	0.4366	0.2075
      84458202	M	13.71	20.83	90.2	577.9	0.1189	0.1645	0.09366	0.05985	0.2196	0.07451	0.5835	1.377	3.856	50.96	0.008805	0.03029	0.02488	0.01448	0.01486	0.005412	17.06	28.14	110.6	897.0	0.1654	0.3682	0.2678	0.1556	0.3196	0.1151
      84348301	M	11.42	20.38	77.58	386.1	0.1425	0.2839	0.2414	0.1052	0.2597	0.09744	0.4956	1.156	3.445	27.23	0.00911	0.07458	0.05661	0.01867	0.05963	0.009208	14.91	26.5	98.87	567.7	0.2098	0.8663	0.6869	0.2575	0.6638	0.173
      842517	M	20.57	17.77	132.9	1326.0	0.08474	0.07864	0.0869	0.07017	0.1812	0.05667	0.5435	0.7339	3.398	74.08	0.005225	0.01308	0.0186	0.0134	0.01389	0.003532	24.99	23.41	158.8	1956.0	0.1238	0.1866	0.2416	0.186	0.275	0.08902
      842302	M	17.99	10.38	122.8	1001.0	0.1184	0.2776	0.3001	0.1471	0.2419	0.07871	1.095	0.9053	8.589	153.4	0.006399	0.04904	0.05373	0.01587	0.03003	0.006193	25.38	17.33	184.6	2019.0	0.1622	0.6656	0.7119	0.2654	0.4601	0.1189
      
  6. Prepare the dataset.
    1. Encode the target column using label encoder.
      from teradataml import LabelEncoder 
      rc = LabelEncoder(values=("M", 1), columns=["diagnosis"], default=0)
      feature_columns_names= Retain(columns=["radius_mean",
         "texture_mean",
         "perimeter_mean",        
         "area_mean",                
         "smoothness_mean" ,         
         "compactness_mean" ,        
         "concavity_mean"  ,       
         "concave_points_mean",   
         "symmetry_mean"       ,    
         "fractal_dimension_mean",   
         "radius_se"              ,  
         "texture_se"           , 
         "perimeter_se"           ,
         "area_se"             ,
         "smoothness_se"          ,
         "compactness_se"           ,
         "concavity_se"         ,
         "concave_points_se"      ,
         "symmetry_se"             ,
         "fractal_dimension_se"   ,
         "radius_worst"           ,
         "texture_worst"          ,
         "perimeter_worst"         ,
         "area_worst"               ,
         "smoothness_worst"        ,
         "compactness_worst"       ,
         "concavity_worst"        ,
         "concave_points_worst"  ,
         "symmetry_worst"        , 
         "fractal_dimension_worst" ])
      configure.val_install_location = "alice"
      data = valib.Transform(data=df, label_encode=rc,index_columns="id",unique_index=True,retain=feature_columns_names)
      df=data.result
    2. Rearrange columns to make sure that the target column is the first and there is no header in the dataset.
      df=df.drop("id",axis=1)
      df= df.select(["diagnosis","radius_mean",
         "texture_mean",
         "perimeter_mean",        
         "area_mean",                
         "smoothness_mean" ,         
         "compactness_mean" ,        
         "concavity_mean"  ,       
         "concave_points_mean",   
         "symmetry_mean"       ,    
         "fractal_dimension_mean",   
         "radius_se"              ,  
         "texture_se"           , 
         "perimeter_se"           ,
         "area_se"             ,
         "smoothness_se"          ,
         "compactness_se"           ,
         "concavity_se"         ,
         "concave_points_se"      ,
         "symmetry_se"             ,
         "fractal_dimension_se"   ,
         "radius_worst"           ,
         "texture_worst"          ,
         "perimeter_worst"         ,
         "area_worst"               ,
         "smoothness_worst"        ,
         "compactness_worst"       ,
         "concavity_worst"        ,
         "concave_points_worst"  ,
         "symmetry_worst"        , 
         "fractal_dimension_worst" ])
      df
      The output:
      diagnosis	radius_mean	texture_mean	perimeter_mean	area_mean	smoothness_mean	compactness_mean	concavity_mean	concave_points_mean	symmetry_mean	fractal_dimension_mean	radius_se	texture_se	perimeter_se	area_se	smoothness_se	compactness_se	concavity_se	concave_points_se	symmetry_se	fractal_dimension_se	radius_worst	texture_worst	perimeter_worst	area_worst	smoothness_worst	compactness_worst	concavity_worst	concave_points_worst	symmetry_worst	fractal_dimension_worst
      1	18.08	21.84	117.4	1024.0	0.07371	0.08642	0.1103	0.05778	0.177	0.0534	0.6362	1.305	4.312	76.36	0.00553	0.05296	0.0611	0.01444	0.0214	0.005036	19.76	24.7	129.1	1228.0	0.08822	0.1963	0.2535	0.09181	0.2369	0.06558
      1	18.05	16.15	120.2	1006.0	0.1065	0.2146	0.1684	0.108	0.2152	0.06673	0.9806	0.5505	6.311	134.8	0.00794	0.05839	0.04658	0.0207	0.02591	0.007054	22.39	18.91	150.1	1610.0	0.1478	0.5634	0.3786	0.2102	0.3751	0.1108
      1	19.07	24.81	128.3	1104.0	0.09081	0.219	0.2107	0.09961	0.231	0.06343	0.9811	1.666	8.83	104.9	0.006548	0.1006	0.09723	0.02638	0.05333	0.007646	24.09	33.17	177.4	1651.0	0.1247	0.7444	0.7242	0.2493	0.467	0.1038
      0	12.21	14.09	78.78	462.0	0.08108	0.07823	0.06839	0.02534	0.1646	0.06154	0.2666	0.8309	2.097	19.96	0.004405	0.03026	0.04344	0.01087	0.01921	0.004622	13.13	19.29	87.65	529.9	0.1026	0.2431	0.3076	0.0914	0.2677	0.08824
      1	17.01	20.26	109.7	904.3	0.08772	0.07304	0.0695	0.0539	0.2026	0.05223	0.5858	0.8554	4.106	68.46	0.005038	0.01503	0.01946	0.01123	0.02294	0.002581	19.8	25.05	130.0	1210.0	0.1111	0.1486	0.1932	0.1096	0.3275	0.06469
      0	11.26	19.96	73.72	394.1	0.0802	0.1181	0.09274	0.05588	0.2595	0.06233	0.4866	1.905	2.877	34.68	0.01574	0.08262	0.08099	0.03487	0.03418	0.006517	11.86	22.33	78.27	437.6	0.1028	0.1843	0.1546	0.09314	0.2955	0.07009
      0	11.93	10.91	76.14	442.7	0.08872	0.05242	0.02606	0.01796	0.1601	0.05541	0.2522	1.045	1.649	18.95	0.006175	0.01204	0.01376	0.005832	0.01096	0.001857	13.8	20.14	87.64	589.5	0.1374	0.1575	0.1514	0.06876	0.246	0.07262
      0	9.042	18.9	60.07	244.5	0.09968	0.1972	0.1975	0.04908	0.233	0.08743	0.4653	1.911	3.769	24.2	0.009845	0.0659	0.1027	0.02527	0.03491	0.007877	10.06	23.4	68.62	297.1	0.1221	0.3748	0.4609	0.1145	0.3135	0.1055
      0	12.47	18.6	81.09	481.9	0.09965	0.1058	0.08005	0.03821	0.1925	0.06373	0.3961	1.044	2.497	30.29	0.006953	0.01911	0.02701	0.01037	0.01782	0.003586	14.97	24.64	96.05	677.9	0.1426	0.2378	0.2671	0.1015	0.3014	0.0875
      0	14.95	18.77	97.84	689.5	0.08138	0.1167	0.0905	0.03562	0.1744	0.06493	0.422	1.909	3.271	39.43	0.00579	0.04877	0.05303	0.01527	0.03356	0.009368	16.25	25.47	107.1	809.7	0.0997	0.2521	0.25	0.08405	0.2852	0.09218
      
    3. Create three samples of input data: sample 1 has 60% of total rows, sample 2 and 3 each has 20% of total rows.
      cancer_sample = df.sample(frac=[0.6, 0.2,0.2])
      cancer_sample
      The output:
      diagnosis	radius_mean	texture_mean	perimeter_mean	area_mean	smoothness_mean	compactness_mean	concavity_mean	concave_points_mean	symmetry_mean	fractal_dimension_mean	radius_se	texture_se	perimeter_se	area_se	smoothness_se	compactness_se	concavity_se	concave_points_se	symmetry_se	fractal_dimension_se	radius_worst	texture_worst	perimeter_worst	area_worst	smoothness_worst	compactness_worst	concavity_worst	concave_points_worst	symmetry_worst	fractal_dimension_worst	sampleid
      1	18.08	21.84	117.4	1024.0	0.07371	0.08642	0.1103	0.05778	0.177	0.0534	0.6362	1.305	4.312	76.36	0.00553	0.05296	0.0611	0.01444	0.0214	0.005036	19.76	24.7	129.1	1228.0	0.08822	0.1963	0.2535	0.09181	0.2369	0.06558	1
      1	18.05	16.15	120.2	1006.0	0.1065	0.2146	0.1684	0.108	0.2152	0.06673	0.9806	0.5505	6.311	134.8	0.00794	0.05839	0.04658	0.0207	0.02591	0.007054	22.39	18.91	150.1	1610.0	0.1478	0.5634	0.3786	0.2102	0.3751	0.1108	3
      1	19.07	24.81	128.3	1104.0	0.09081	0.219	0.2107	0.09961	0.231	0.06343	0.9811	1.666	8.83	104.9	0.006548	0.1006	0.09723	0.02638	0.05333	0.007646	24.09	33.17	177.4	1651.0	0.1247	0.7444	0.7242	0.2493	0.467	0.1038	3
      0	16.17	16.07	106.3	788.5	0.0988	0.1438	0.06651	0.05397	0.199	0.06572	0.1745	0.489	1.349	14.91	0.00451	0.01812	0.01951	0.01196	0.01934	0.003696	16.97	19.14	113.1	861.5	0.1235	0.255	0.2114	0.1251	0.3153	0.0896	3
      1	16.26	21.88	107.5	826.8	0.1165	0.1283	0.1799	0.07981	0.1869	0.06532	0.5706	1.457	2.961	57.72	0.01056	0.03756	0.05839	0.01186	0.04022	0.006187	17.73	25.21	113.7	975.2	0.1426	0.2116	0.3344	0.1047	0.2736	0.07953	2
      1	15.3	25.27	102.4	732.4	0.1082	0.1697	0.1683	0.08751	0.1926	0.0654	0.439	1.012	3.498	43.5	0.005233	0.03057	0.03576	0.01083	0.01768	0.002967	20.27	36.71	149.3	1269.0	0.1641	0.611	0.6335	0.2024	0.4027	0.09876	1
      0	14.5	10.89	94.28	640.7	0.1101	0.1099	0.08842	0.05778	0.1856	0.06402	0.2929	0.857	1.928	24.19	0.003818	0.01276	0.02882	0.012	0.0191	0.002808	15.7	15.98	102.8	745.5	0.1313	0.1788	0.256	0.1221	0.2889	0.08006	1
      0	15.04	16.74	98.73	689.4	0.09883	0.1364	0.07721	0.06142	0.1668	0.06869	0.372	0.8423	2.304	34.84	0.004123	0.01819	0.01996	0.01004	0.01055	0.003237	16.76	20.43	109.7	856.9	0.1135	0.2176	0.1856	0.1018	0.2177	0.08549	1
      1	16.11	18.05	105.1	813.0	0.09721	0.1137	0.09447	0.05943	0.1861	0.06248	0.7049	1.332	4.533	74.08	0.00677	0.01938	0.03067	0.01167	0.01875	0.003434	19.92	25.27	129.0	1233.0	0.1314	0.2236	0.2802	0.1216	0.2792	0.08158	1
      0	9.042	18.9	60.07	244.5	0.09968	0.1972	0.1975	0.04908	0.233	0.08743	0.4653	1.911	3.769	24.2	0.009845	0.0659	0.1027	0.02527	0.03491	0.007877	10.06	23.4	68.62	297.1	0.1221	0.3748	0.4609	0.1145	0.3135	0.1055	3
      
    4. Create train dataset from sample 1 by filtering on "sampleid" and drop "sampleid" column as it is not required for training model.
      train = cancer_sample[cancer_sample.sampleid == "1"].drop("sampleid", axis = 1)
      train
      The output:
      diagnosis	radius_mean	texture_mean	perimeter_mean	area_mean	smoothness_mean	compactness_mean	concavity_mean	concave_points_mean	symmetry_mean	fractal_dimension_mean	radius_se	texture_se	perimeter_se	area_se	smoothness_se	compactness_se	concavity_se	concave_points_se	symmetry_se	fractal_dimension_se	radius_worst	texture_worst	perimeter_worst	area_worst	smoothness_worst	compactness_worst	concavity_worst	concave_points_worst	symmetry_worst	fractal_dimension_worst
      1	18.08	21.84	117.4	1024.0	0.07371	0.08642	0.1103	0.05778	0.177	0.0534	0.6362	1.305	4.312	76.36	0.00553	0.05296	0.0611	0.01444	0.0214	0.005036	19.76	24.7	129.1	1228.0	0.08822	0.1963	0.2535	0.09181	0.2369	0.06558
      0	11.93	10.91	76.14	442.7	0.08872	0.05242	0.02606	0.01796	0.1601	0.05541	0.2522	1.045	1.649	18.95	0.006175	0.01204	0.01376	0.005832	0.01096	0.001857	13.8	20.14	87.64	589.5	0.1374	0.1575	0.1514	0.06876	0.246	0.07262
      0	11.26	19.96	73.72	394.1	0.0802	0.1181	0.09274	0.05588	0.2595	0.06233	0.4866	1.905	2.877	34.68	0.01574	0.08262	0.08099	0.03487	0.03418	0.006517	11.86	22.33	78.27	437.6	0.1028	0.1843	0.1546	0.09314	0.2955	0.07009
      1	16.11	18.05	105.1	813.0	0.09721	0.1137	0.09447	0.05943	0.1861	0.06248	0.7049	1.332	4.533	74.08	0.00677	0.01938	0.03067	0.01167	0.01875	0.003434	19.92	25.27	129.0	1233.0	0.1314	0.2236	0.2802	0.1216	0.2792	0.08158
      0	14.5	10.89	94.28	640.7	0.1101	0.1099	0.08842	0.05778	0.1856	0.06402	0.2929	0.857	1.928	24.19	0.003818	0.01276	0.02882	0.012	0.0191	0.002808	15.7	15.98	102.8	745.5	0.1313	0.1788	0.256	0.1221	0.2889	0.08006
      1	17.06	21.0	111.8	918.6	0.1119	0.1056	0.1508	0.09934	0.1727	0.06071	0.8161	2.129	6.076	87.17	0.006455	0.01797	0.04502	0.01744	0.01829	0.003733	20.99	33.15	143.2	1362.0	0.1449	0.2053	0.392	0.1827	0.2623	0.07599
      1	14.99	25.2	95.54	698.8	0.09387	0.05131	0.02398	0.02899	0.1565	0.05504	1.214	2.188	8.077	106.0	0.006883	0.01094	0.01818	0.01917	0.007882	0.001754	14.99	25.2	95.54	698.8	0.09387	0.05131	0.02398	0.02899	0.1565	0.05504
      1	16.69	20.2	107.1	857.6	0.07497	0.07112	0.03649	0.02307	0.1846	0.05325	0.2473	0.5679	1.775	22.95	0.002667	0.01446	0.01423	0.005297	0.01961	0.0017	19.18	26.56	127.3	1084.0	0.1009	0.292	0.2477	0.08737	0.4677	0.07623
      1	15.3	25.27	102.4	732.4	0.1082	0.1697	0.1683	0.08751	0.1926	0.0654	0.439	1.012	3.498	43.5	0.005233	0.03057	0.03576	0.01083	0.01768	0.002967	20.27	36.71	149.3	1269.0	0.1641	0.611	0.6335	0.2024	0.4027	0.09876
      0	9.042	18.9	60.07	244.5	0.09968	0.1972	0.1975	0.04908	0.233	0.08743	0.4653	1.911	3.769	24.2	0.009845	0.0659	0.1027	0.02527	0.03491	0.007877	10.06	23.4	68.62	297.1	0.1221	0.3748	0.4609	0.1145	0.3135	0.1055
      
    5. Create validate dataset from sample 2 by filtering on "sampleid" and drop "sampleid" column as it is not required for training model.
      validate = cancer_sample[cancer_sample.sampleid == "2"].drop("sampleid", axis = 1)
      validate
      The output:
      diagnosis	radius_mean	texture_mean	perimeter_mean	area_mean	smoothness_mean	compactness_mean	concavity_mean	concave_points_mean	symmetry_mean	fractal_dimension_mean	radius_se	texture_se	perimeter_se	area_se	smoothness_se	compactness_se	concavity_se	concave_points_se	symmetry_se	fractal_dimension_se	radius_worst	texture_worst	perimeter_worst	area_worst	smoothness_worst	compactness_worst	concavity_worst	concave_points_worst	symmetry_worst	fractal_dimension_worst
      0	12.18	20.52	77.22	458.7	0.08013	0.04038	0.02383	0.0177	0.1739	0.05677	0.1924	1.571	1.183	14.68	0.00508	0.006098	0.01069	0.006797	0.01447	0.001532	13.34	32.84	84.58	547.8	0.1123	0.08862	0.1145	0.07431	0.2694	0.06878
      0	13.28	13.72	85.79	541.8	0.08363	0.08575	0.05077	0.02864	0.1617	0.05594	0.1833	0.5308	1.592	15.26	0.004271	0.02073	0.02828	0.008468	0.01461	0.002613	14.24	17.37	96.59	623.7	0.1166	0.2685	0.2866	0.09173	0.2736	0.0732
      0	12.54	18.07	79.42	491.9	0.07436	0.0265	0.001194	0.005449	0.1528	0.05185	0.3511	0.9527	2.329	28.3	0.005783	0.004693	0.0007929	0.003617	0.02043	0.001058	13.72	20.98	86.82	585.7	0.09293	0.04327	0.003581	0.01635	0.2233	0.05521
      1	20.09	23.86	134.7	1247.0	0.108	0.1838	0.2283	0.128	0.2249	0.07469	1.072	1.743	7.804	130.8	0.007964	0.04732	0.07649	0.01936	0.02736	0.005928	23.68	29.43	158.8	1696.0	0.1347	0.3391	0.4932	0.1923	0.3294	0.09469
      0	13.59	17.84	86.24	572.3	0.07948	0.04052	0.01997	0.01238	0.1573	0.0552	0.258	1.166	1.683	22.22	0.003741	0.005274	0.01065	0.005044	0.01344	0.001126	15.5	26.1	98.91	739.1	0.105	0.07622	0.106	0.05185	0.2335	0.06263
      0	9.397	21.68	59.75	268.8	0.07969	0.06053	0.03735	0.005128	0.1274	0.06724	0.1186	1.182	1.174	6.802	0.005515	0.02674	0.03735	0.005128	0.01951	0.004583	9.965	27.99	66.61	301.0	0.1086	0.1887	0.1868	0.02564	0.2376	0.09206
      0	13.64	16.34	87.21	571.8	0.07685	0.06059	0.01857	0.01723	0.1353	0.05953	0.1872	0.9234	1.449	14.55	0.004477	0.01177	0.01079	0.007956	0.01325	0.002551	14.67	23.19	96.08	656.7	0.1089	0.1582	0.105	0.08586	0.2346	0.08025
      0	13.34	15.86	86.49	520.0	0.1078	0.1535	0.1169	0.06987	0.1942	0.06902	0.286	1.016	1.535	12.96	0.006794	0.03575	0.0398	0.01383	0.02134	0.004603	15.53	23.19	96.66	614.9	0.1536	0.4791	0.4858	0.1708	0.3527	0.1016
      0	9.847	15.68	63.0	293.2	0.09492	0.08419	0.0233	0.02416	0.1387	0.06891	0.2498	1.216	1.976	15.24	0.008732	0.02042	0.01062	0.006801	0.01824	0.003494	11.24	22.99	74.32	376.5	0.1419	0.2243	0.08434	0.06528	0.2502	0.09209
      1	16.35	23.29	109.0	840.4	0.09742	0.1497	0.1811	0.08773	0.2175	0.06218	0.4312	1.022	2.972	45.5	0.005635	0.03917	0.06072	0.01656	0.03197	0.004085	19.38	31.03	129.3	1165.0	0.1415	0.4665	0.7087	0.2248	0.4824	0.09614
    6. Create test dataset from sample 3 by filtering on "sampleid" and drop "sampleid" column as it is not required for scoring.
      test = cancer_sample[cancer_sample.sampleid == "3"].drop("sampleid", axis = 1)
      test
      The output:
      diagnosis	radius_mean	texture_mean	perimeter_mean	area_mean	smoothness_mean	compactness_mean	concavity_mean	concave_points_mean	symmetry_mean	fractal_dimension_mean	radius_se	texture_se	perimeter_se	area_se	smoothness_se	compactness_se	concavity_se	concave_points_se	symmetry_se	fractal_dimension_se	radius_worst	texture_worst	perimeter_worst	area_worst	smoothness_worst	compactness_worst	concavity_worst	concave_points_worst	symmetry_worst	fractal_dimension_worst
      0	10.26	12.22	65.75	321.6	0.09996	0.07542	0.01923	0.01968	0.18	0.06569	0.1911	0.5477	1.348	11.88	0.005682	0.01365	0.008496	0.006929	0.01938	0.002371	11.38	15.65	73.23	394.5	0.1343	0.165	0.08615	0.06696	0.2937	0.07722
      0	11.33	14.16	71.79	396.6	0.09379	0.03872	0.001487	0.003333	0.1954	0.05821	0.2375	1.28	1.565	17.09	0.008426	0.008998	0.001487	0.003333	0.02358	0.001627	12.2	18.99	77.37	458.0	0.1259	0.07348	0.004955	0.01111	0.2758	0.06386
      0	11.37	18.89	72.17	396.0	0.08713	0.05008	0.02399	0.02173	0.2013	0.05955	0.2656	1.974	1.954	17.49	0.006538	0.01395	0.01376	0.009924	0.03416	0.002928	12.36	26.14	79.29	459.3	0.1118	0.09708	0.07529	0.06203	0.3267	0.06994
      0	12.47	18.6	81.09	481.9	0.09965	0.1058	0.08005	0.03821	0.1925	0.06373	0.3961	1.044	2.497	30.29	0.006953	0.01911	0.02701	0.01037	0.01782	0.003586	14.97	24.64	96.05	677.9	0.1426	0.2378	0.2671	0.1015	0.3014	0.0875
      1	21.75	20.99	147.3	1491.0	0.09401	0.1961	0.2195	0.1088	0.1721	0.06194	1.167	1.352	8.867	156.8	0.005687	0.0496	0.06329	0.01561	0.01924	0.004614	28.19	28.18	195.9	2384.0	0.1272	0.4725	0.5807	0.1841	0.2833	0.08858
      0	8.618	11.79	54.34	224.5	0.09752	0.05272	0.02061	0.007799	0.1683	0.07187	0.1559	0.5796	1.046	8.322	0.01011	0.01055	0.01981	0.005742	0.0209	0.002788	9.507	15.4	59.9	274.9	0.1733	0.1239	0.1168	0.04419	0.322	0.09026
      0	13.28	13.72	85.79	541.8	0.08363	0.08575	0.05077	0.02864	0.1617	0.05594	0.1833	0.5308	1.592	15.26	0.004271	0.02073	0.02828	0.008468	0.01461	0.002613	14.24	17.37	96.59	623.7	0.1166	0.2685	0.2866	0.09173	0.2736	0.0732
      0	12.34	12.27	78.94	468.5	0.09003	0.06307	0.02958	0.02647	0.1689	0.05808	0.1166	0.4957	0.7714	8.955	0.003681	0.009169	0.008732	0.00574	0.01129	0.001366	13.61	19.27	87.22	564.9	0.1292	0.2074	0.1791	0.107	0.311	0.07592
      0	11.89	21.17	76.39	433.8	0.09773	0.0812	0.02555	0.02179	0.2019	0.0629	0.2747	1.203	1.93	19.53	0.009895	0.03053	0.0163	0.009276	0.02258	0.002272	13.05	27.21	85.09	522.9	0.1426	0.2187	0.1164	0.08263	0.3075	0.07351
      0	10.29	27.61	65.67	321.4	0.0903	0.07658	0.05999	0.02738	0.1593	0.06127	0.2199	2.239	1.437	14.46	0.01205	0.02736	0.04804	0.01721	0.01843	0.004938	10.84	34.91	69.57	357.6	0.1384	0.171	0.2	0.09127	0.2226	0.08283
      
  7. Create XGBoost SageMaker estimator instance through tdapiclient.
    exec_role_arn = "arn:aws:iam::076782961461:role/service-role/AmazonSageMaker-ExecutionRole-20210112T215668"
    xgboost_estimator = td_apiclient.XGBoost(
        entry_point="script.py",
        role=exec_role_arn,
        output_path=model_artifacts_location,
        code_location=custom_code_upload_location,
        instance_count=1,
        instance_type="ml.m5.xlarge",
        framework_version="1.3-1",
        trainingSparkDataFormat="csv",
        trainingContentType="csv"
    )
    xgboost_estimator.set_hyperparameters(max_depth=5, 
                            eta=0.2, 
                            gamma=4,
                            min_child_weight=6, 
                            subsample=0.8, 
                            csv_weights=1,
                            num_round=30)
  8. Start training XGBoost estimator using teradataml DataFrame objects.
    xgboost_estimator.fit({'train': train, 'validation': validate }, content_type="csv",wait=True)
    
  9. Create Serializer and Deserializer, so predictor can handle CSV input and output.
    from sagemaker.serializers import CSVSerializer
    from sagemaker.deserializers import CSVDeserializer
    csv_ser = CSVSerializer()
    csv_dser = CSVDeserializer()
    predictor = xgboost_estimator.deploy("aws-endpoint",
                                         sagemaker_kw_args={"instance_type": "ml.m5.large", "initial_instance_count": 1, "serializer": csv_ser, "deserializer": csv_dser})
  10. Try prediction integration using teradataml DataFrame and the predictor object created in previous step.
    1. Confirm that predictor is correctly configured for accepting csv input.
      print(predictor.cloudObj.accept)
      The output:
      ('text/csv',)
    2. Prepare test dataset.
      test=test.drop("diagnosis",axis=1)
      
      item=test.head(1)
      The output:
      radius_mean	texture_mean	perimeter_mean	area_mean	smoothness_mean	compactness_mean	concavity_mean	concave_points_mean	symmetry_mean	fractal_dimension_mean	radius_se	texture_se	perimeter_se	area_se	smoothness_se	compactness_se	concavity_se	concave_points_se	symmetry_se	fractal_dimension_se	radius_worst	texture_worst	perimeter_worst	area_worst	smoothness_worst	compactness_worst	concavity_worst	concave_points_worst	symmetry_worst	fractal_dimension_worst
      8.219	20.7	53.27	203.9	0.09405	0.1305	0.1321	0.02168	0.2222	0.08261	0.1935	1.962	1.243	10.21	0.01243	0.05416	0.07753	0.01022	0.02309	0.01178	9.092	29.72	58.08	249.8	0.163	0.431	0.5381	0.07879	0.3322	0.1486
    3. Try prediction with UDF and Client options.
      Prediction with UDF option:
      output = predictor.predict(item, mode="UDF",content_type='csv')
      output
      The output:
      radius_mean	texture_mean	perimeter_mean	area_mean	smoothness_mean	compactness_mean	concavity_mean	concave_points_mean	symmetry_mean	fractal_dimension_mean	radius_se	texture_se	perimeter_se	area_se	smoothness_se	compactness_se	concavity_se	concave_points_se	symmetry_se	fractal_dimension_se	radius_worst	texture_worst	perimeter_worst	area_worst	smoothness_worst	compactness_worst	concavity_worst	concave_points_worst	symmetry_worst	fractal_dimension_worst	Output
      8.219	20.7	53.27	203.9	0.09405	0.1305	0.1321	0.02168	0.2222	0.08261	0.1935	1.962	1.243	10.21	0.01243	0.05416	0.07753	0.01022	0.02309	0.01178	9.092	29.72	58.08	249.8	0.163	0.431	0.5381	0.07879	0.3322	0.1486	0.06437
      
      Prediction with Client option:
      output = predictor.predict(item, mode="client",content_type='csv')
      output
      The output:
      [['0.03782']]
  11. Clean up.
    predictor.cloudObj.delete_model()
    predictor.cloudObj.delete_endpoint()
    remove_tdapi_context(tdapi_context)