Open Analytics Workflows | teradataml | OpenAF on VantageCloud Lake - Open Analytics Workflows - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

Download the teradataml_workflows_opaf.zip from the attachment in the left sidebar. The zip file includes Jupyter notebooks and supporting files for sample Open Analytics workflows in the notebooks folder.

The following workflow demonstrates the prediction of a flower type based on a series of features by using the PMML model, which is generated outside of VantageCloud Lake.

This workflow uses the following files that are included in the Scoring_PMML folder in the attached terdataml_workflows_opaf.zip:
  • Iris.csv: Iris data set
  • single_iris_dectree.xml: PMML model file
  • pmml_test.py: Python file to score the data in VantageCloud Lake
  1. Load required libraries.
    from teradataml import create_context, remove_context, list_base_envs, list_user_envs, create_env, remove_env, get_env, DataFrame, copy_to_sql, Apply, configure, read_csv, set_config_params, load_example_data
    from teradataml.options.display import display
    import pandas as pd, getpass, os
    from collections import OrderedDict
    from teradatasqlalchemy.types import BIGINT, VARCHAR, INTEGER, FLOAT
  2. Set authentication token and base url.
    set_config_params(base_url=getpass.getpass("Base URL: "),
                      auth_token=getpass.getpass("JWT Token: "))
  3. Create the connection.
    You can use the same JWT token to create a context instead of password. See create_context section for more details.
    con = create_context(host=getpass.getpass("Hostname: "),
                         username=getpass.getpass("Username: "),
                         password=getpass.getpass("Password: "))
  4. Load the Iris data into VantageCloud Lake using read_csv() function and create a DataFrame from it.
    types = OrderedDict(sepal_length = FLOAT(),
                        sepal_width = FLOAT(),
                        petal_length = FLOAT(),
                        petal_width = FLOAT())
    types['class'] = VARCHAR()
    pmml_test_data = read_csv('Iris.csv', table_name='pmml_test_data', types=types)
    pmml_test_data 
  5. Check available base environment.
    list_base_envs()
    The following steps create a new Python user environment for python_3.8.13.
  6. Create new user environment and view existing libraries in it.
    1. Create new user environment using function create_env(). It returns an object of UserEnv.
      demo_env = create_env(env_name = 'oaf_usecase_2b_env',
                            base_env = 'python_3.8.13',
                            desc = 'OAF Demo Use Case 2b Environment')
    2. Verify the new environment has been created.
      list_user_envs()
    3. View existing libraries in the user environment.
      demo_env.libs
  7. Synchronously install any Python add-ons needed by the script in the user environment.
    1. Install Python libraries using an object 'demo_env' of class "UserEnv". .
      demo_env.install_lib(["pypmml", "pandas"])
    2. Verify the Python libraries have been installed correctly.
      demo_env.libs
    3. Install the predictive model exported in PMML.
      demo_env.install_file(file_path = 'single_iris_dectree.xml', replace = True)
    4. Install the Python script file for scoring into the environment.
      demo_env.install_file(file_path = 'pmml_test.py', replace = True)
    5. Verify the files have been installed correctly.
      demo_env.files
  8. Use APPLY table operator to create an object for the PMML prediction.
    apply_obj = Apply(data = pmml_test_data,
                      apply_command = 'python3 pmml_test.py',
                      returns = {"nodeid": INTEGER(), "Predicted_Class": VARCHAR(30),
                                 "Probability": FLOAT(), "Prob_Setosa": VARCHAR(30),
                                 "Prob_Versicolor": VARCHAR(30),
                                 "Prob_Virginica": VARCHAR(30)},
                      env_name = demo_env
                     )
  9. Run the Python script inside the remote user environment.
    You can see the underlying SQL by setting 'display.print_sqlmr_query = True'
    apply_obj.execute_script()
  10. Remove the environment after scoring the data.
    remove_env('oaf_usecase_2b_env')   
  11. Verify the specified environment has been removed.
    list_user_envs()
  12. Disconnect from VantageCloud Lake.
    remove_context()