Example 1: Working with Python script - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

In this example, the Python script mapper.py reads in a line of text input ("Old Macdonald Had A Farm") from a csv file and splits the line into individual words, emitting a new row for each word.

  • Load example data.
    >>> load_example_data("Script", ["barrier"])
  • Create teradataml DataFrame objects.
    >>> barrierdf = DataFrame.from_table("barrier")
  • Create remote user environment.
    >>> testenv = create_env('testenv', 'python_3.7.13', 'Demo environment')
    User environment testenv created.
    >>> import os, teradataml
    >>> teradataml_dir = os.path.dirname(teradataml.__file__)
  • Create an APPLY object that allows user to run script.
    >>> apply_obj = Apply(data=barrierdf,
                          script_name='mapper.py',
                          files_local_path= os.path.join(teradataml_dir, 'data', 'scripts'),
                          apply_command='python3 mapper.py',
                          data_order_column="Id",
                          is_local_order=False,
                          nulls_first=False,
                          sort_ascending=False,
                          returns={"word": VARCHAR(15), "count_input": VARCHAR(10)},
                          env_name=testenv,
                          delimiter='\t')
  • Run user script locally within Docker container and using data from the csv file.

    This helps the user to fix script level issues outside Open Analytics Framework.

    • Set up the environment by providing local path to the Docker image file.
      >>> apply_obj.setup_sto_env(docker_image_location='/tmp/sto_sandbox_docker_image.tar'))
      Loading image from /tmp/sto_sandbox_docker_image.tar. It may take few minutes.
      Image loaded successfully.
    • Run user script locally in the Docker container.
      >>> apply_obj.test_script(input_data_file=os.path.join(teradataml_dir, 'data', 'barrier.csv'))
      ############ STDOUT Output ############
       
              word count_input
      0  Macdonald           1
      1          A           1
      2       Farm           1
      3        Had           1
      4        Old           1
      5          1           1
  • Install the script file in remote user environment.
    >>> apply_obj.install_file(file_name=os.path.join(teradataml_dir, 'data', 'mapper.py'))
    File 'mapper.py' installed successfully in the remote user environment 'demo_env'.
  • Run the user script in Open Analytics Framework
    >>> apply_obj.execute_script()
            word count_input
    0  Macdonald           1
    1          A           1
    2       Farm           1
    3        Had           1
    4        Old           1
    5          1           1
  • Remove the installed file from remote user environment.
    >>> apply_obj.remove_file(file_name='mapper.py')
    File 'mapper.py' removed successfully from the remote user environment 'demo_env'.