Open Analytics Workflows | teradataml | OpenAF on VantageCloud Lake - Open Analytics Workflows

Open Analytics Workflows | teradataml | OpenAF on VantageCloud Lake - Open Analytics Workflows - Teradata Vantage

Teradata® VantageCloud Lake

Deployment

VantageCloud

Edition

Lake

Product

Teradata Vantage

Published

January 2023

Language

English (United States)

Last Update

2024-04-03

dita:mapPath

phg1621910019905.ditamap

dita:ditavalPath

pny1626732985837.ditaval

dita:id

phg1621910019905

Download the teradataml_workflows_opaf.zip from the attachment in the left sidebar. The zip file includes Jupyter notebooks and supporting files for sample Open Analytics workflows in the notebooks folder.

The following workflow demonstrates the prediction of a flower type based on a series of features by using the PMML model, which is generated outside of VantageCloud Lake.

This workflow uses the following files that are included in the Scoring_PMML folder in the attached terdataml_workflows_opaf.zip:

Iris.csv: Iris data set
single_iris_dectree.xml: PMML model file
pmml_test.py: Python file to score the data in VantageCloud Lake

Load required libraries.

from teradataml import create_context, remove_context, list_base_envs, list_user_envs, create_env, remove_env, get_env, DataFrame, copy_to_sql, Apply, configure, read_csv, set_config_params, load_example_data

from teradataml.options.display import display

import pandas as pd, getpass, os

from collections import OrderedDict

from teradatasqlalchemy.types import BIGINT, VARCHAR, INTEGER, FLOAT

Set authentication token and base url.

set_config_params(base_url=getpass.getpass("Base URL: "),
                  auth_token=getpass.getpass("JWT Token: "))

Create the connection.

You can use the same JWT token to create a context instead of password. See create_context section for more details.

con = create_context(host=getpass.getpass("Hostname: "),
                     username=getpass.getpass("Username: "),
                     password=getpass.getpass("Password: "))

Load the Iris data into VantageCloud Lake using read_csv() function and create a DataFrame from it.

types = OrderedDict(sepal_length = FLOAT(),
                    sepal_width = FLOAT(),
                    petal_length = FLOAT(),
                    petal_width = FLOAT())

types['class'] = VARCHAR()

pmml_test_data = read_csv('Iris.csv', table_name='pmml_test_data', types=types)

pmml_test_data

Check available base environment.
```
list_base_envs()
```
The following steps create a new Python user environment for python_3.8.13.
Create new user environment and view existing libraries in it.
1. Create new user environment using function create_env(). It returns an object of UserEnv.
```
demo_env = create_env(env_name = 'oaf_usecase_2b_env',
                      base_env = 'python_3.8.13',
                      desc = 'OAF Demo Use Case 2b Environment')
```
2. Verify the new environment has been created.
```
list_user_envs()
```
3. View existing libraries in the user environment.
```
demo_env.libs
```
Synchronously install any Python add-ons needed by the script in the user environment.
1. Install Python libraries using an object 'demo_env' of class "UserEnv". .
```
demo_env.install_lib(["pypmml", "pandas"])
```
2. Verify the Python libraries have been installed correctly.
```
demo_env.libs
```
3. Install the predictive model exported in PMML.
```
demo_env.install_file(file_path = 'single_iris_dectree.xml', replace = True)
```
4. Install the Python script file for scoring into the environment.
```
demo_env.install_file(file_path = 'pmml_test.py', replace = True)
```
5. Verify the files have been installed correctly.
```
demo_env.files
```

Use APPLY table operator to create an object for the PMML prediction.

apply_obj = Apply(data = pmml_test_data,
                  apply_command = 'python3 pmml_test.py',
                  returns = {"nodeid": INTEGER(), "Predicted_Class": VARCHAR(30),
                             "Probability": FLOAT(), "Prob_Setosa": VARCHAR(30),
                             "Prob_Versicolor": VARCHAR(30),
                             "Prob_Virginica": VARCHAR(30)},
                  env_name = demo_env
                 )

Run the Python script inside the remote user environment.
You can see the underlying SQL by setting 'display.print_sqlmr_query = True'
```
apply_obj.execute_script()
```
Remove the environment after scoring the data.
```
remove_env('oaf_usecase_2b_env')   
```
Verify the specified environment has been removed.
```
list_user_envs()
```
Disconnect from VantageCloud Lake.
```
remove_context()
```