set_data | Script Method | Teradata Python Package - 17.00 - set_data - Teradata Package for Python

Teradata® Package for Python User Guide

Product
Teradata Package for Python
Release Number
17.00
Release Date
November 2021
Content Type
User Guide
Publication ID
B700-4006-070K
Language
English (United States)

Use the set_data method to set data and data related arguments without having to re-create Script object.

Some of the set_data method arguments used in the examples:

  • Required argument data specifies a teradataml DataFrame containing the input data for the script.
  • Optional argument is_local_order specifies a boolean value to determine whether the input data is to be ordered locally or not.
  • Optional argument data_order_column specifies the Order By column for data, and can be used no matter is_local_order is set to 'True' or 'False'.
  • Optional argument sort_ascending specifies a boolean value to determine if the result set is to be sorted on the column specified in data_order_column, in ascending or descending order. This argument is ignored, if data_order_column is None.
  • Optional argument nulls_first specifies a boolean value to determine whether NULLS are listed first or last during ordering.
For details of all arguments, see Teradata Package for Python Function Reference.

Example Prerequisites

  • To run the examples, "mapper.py" and "barrier.csv" are required and must be present under the same location specified by the argument files_local_path.
    • "barrier.csv" is present under <teradataml_install_location>/teradataml/data directory.
    • "mapper.py" can be created as follows:
      #!/usr/bin/python
      import sys
      for line in sys.stdin:
          line = line.strip()
          words = line.split()
          for word in words:
              print ('%s\t%s' % (word, 1))
  • Before running the examples, import required packages:
    >>> from collections import OrderedDict
    >>> from teradatasqlalchemy import (VARCHAR)

Example 1

This example uses test_script to test a Python script, then reset data and related arguments using set_data method and run Python script on Vantage with new parameters.

All data related arguments that are not specified in set_data() will be reset to default values.
  • Create teradataml DataFrame.
    >>> barrierdf = DataFrame.from_table("barrier")
  • Create a Script object without data and its arguments.
    >>> sto = Script(
                script_name='mapper.py',
                files_local_path= 'data/scripts',
                script_command='python3 ./<database name>/mapper.py',
                charset='latin',
                returns=OrderedDict([("word", VARCHAR(15)), ("count_input", VARCHAR(2))])
                )
  • Test script using data from file.
    >>> sto.test_script(input_data_file='../barrier.csv')
    ############ STDOUT Output ############
      
            word count_input
    0  Macdonald           1
    1          A           1
    2       Farm           1
    3        Had           1
    4        Old           1
    5          1           1
  • Set data and related arguments to run and test the script on actual data on Vantage.
    >>> sto.set_data(data='barrier',
    data_order_column="Id",
    is_local_order=False,
    nulls_first=False,
    sort_ascending=False
    )
  • Set the search path to the database where the file is installed.
    >>> get_context().execute("SET SESSION SEARCHUIFDBPATH = <database name>;")
  • Run the user script on Vantage.
    >>> sto.execute_script()
    ############ STDOUT Output ############
      
            word count_input
    0  Macdonald           1
    1          A           1
    2       Farm           1
    3        Had           1
    4        Old           1
    5          1           1

Example 2

In this example, script is tested using test_script and run on Vantage. User can reset data and related arguments and execute script on Vantage again.

All data related arguments that are not specified in set_data() will be reset to default values.
  • Create a Script object that allows user to run script on Vantage.
    >>> sto = Script(data=barrierdf,
                script_name='mapper.py',
                files_local_path= 'data/scripts',
                script_command='python3 ./<database name>/mapper.py',
                data_order_column="Id",
                is_local_order=False
                delimiter=',',
                nulls_first=False,
                sort_ascending=False,
                charset='latin', returns=OrderedDict([("word", VARCHAR(15)), ("count_input", VARCHAR(2))])
                )
  • Set the search path to the database where the file is installed.
    >>> get_context().execute("SET SESSION SEARCHUIFDBPATH = <database name>;")
  • Test the script using test_script running on Vantage.
    >>> sto.execute_script()
    ############ STDOUT Output ############
     
            word count_input
    0  Macdonald           1
    1          A           1
    2       Farm           1
    3        Had           1
    4        Old           1
    5          1           1
  • Run the set_data to re-set data and some related parameter, to run the script with a different dataset.
    • Create a new DataFrame.
      >>> barrierdf_new = DataFrame.from_table("barrier_new")
    • Set data and related arguments with different values.
      >>> sto.set_data(data=barrierdf_new, data_order_column='Id', is_local_order=True, nulls_first=True)
    • Run the script again.
      >>> sto.execute_script()
              word  count_input
      0  Macdonald            1
      1          A            1
      2       Farm            1
      3          2            1
      4        his            1
      5       farm            1
      6         On            1
      7        Had            1
      8        Old            1
      9          1            1

Example 3

In this example, script is tested using test_script and run on Vantage. User run the script again with same dataset but different data related arguments by using set_data() to reset arguments.

All data related arguments that are not specified in set_data() will be reset to default values.
  • Set related arguments with different values, but use same data.
    >>> sto.set_data(data=barrierdf, data_order_column='Id', is_local_order=True, nulls_first=True)
  • Set the search path to the database where the file is installed.
    >>> get_context().execute("SET SESSION SEARCHUIFDBPATH = <database name>;")
  • Run the script again.
    >>> sto.execute_script()
    ############ STDOUT Output ############
     
            word count_input
    0  Macdonald           1
    1          A           1
    2       Farm           1
    3        Had           1
    4        Old           1
    5          1           1