Use the set_data method to set data and data related arguments without having to re-create Script object.
Some of the set_data method arguments used in the examples:
- Required argument data specifies a teradataml DataFrame containing the input data for the script.
- Optional argument is_local_order specifies a boolean value to determine whether the input data is to be ordered locally or not.
- Optional argument data_order_column specifies the Order By column for data, and can be used no matter is_local_order is set to 'True' or 'False'.
- Optional argument sort_ascending specifies a boolean value to determine if the result set is to be sorted on the column specified in data_order_column, in ascending or descending order. This argument is ignored, if data_order_column is None.
- Optional argument nulls_first specifies a boolean value to determine whether NULLS are listed first or last during ordering.
Example Prerequisites
- To run the examples, "mapper.py" and "barrier.csv" are required and must be present under the same location specified by the argument files_local_path.
- "barrier.csv" is present under <teradataml_install_location>/teradataml/data directory.
- "mapper.py" can be created as follows:
#!/usr/bin/python import sys for line in sys.stdin: line = line.strip() words = line.split() for word in words: print ('%s\t%s' % (word, 1))
- Before running the examples, import required packages:
>>> from collections import OrderedDict
>>> from teradatasqlalchemy import (VARCHAR)
Example 1
This example uses test_script to test a Python script, then reset data and related arguments using set_data method and run Python script on Vantage with new parameters.
All data related arguments that are not specified in set_data() will be reset to default values.
- Create teradataml DataFrame.
>>> barrierdf = DataFrame.from_table("barrier")
- Create a Script object without data and its arguments.
>>> sto = Script( script_name='mapper.py', files_local_path= 'data/scripts', script_command='python3 ./<database name>/mapper.py', charset='latin', returns=OrderedDict([("word", VARCHAR(15)), ("count_input", VARCHAR(2))]) )
- Test script using data from file.
>>> sto.test_script(input_data_file='../barrier.csv') ############ STDOUT Output ############ word count_input 0 Macdonald 1 1 A 1 2 Farm 1 3 Had 1 4 Old 1 5 1 1
- Set data and related arguments to run and test the script on actual data on Vantage.
>>> sto.set_data(data='barrier', data_order_column="Id", is_local_order=False, nulls_first=False, sort_ascending=False )
- Set the search path to the database where the file is installed.
>>> get_context().execute("SET SESSION SEARCHUIFDBPATH = <database name>;")
- Run the user script on Vantage.
>>> sto.execute_script() ############ STDOUT Output ############ word count_input 0 Macdonald 1 1 A 1 2 Farm 1 3 Had 1 4 Old 1 5 1 1
Example 2
In this example, user resets data and related arguments, and execute script on Vantage again.
All data related arguments that are not specified in set_data() will be reset to default values.
- Create a Script object that allows user to run script on Vantage.
>>> sto = Script(data=barrierdf, script_name='mapper.py', files_local_path= 'data/scripts', script_command='python3 ./<database name>/mapper.py', data_order_column="Id", is_local_order=False delimiter=',', nulls_first=False, sort_ascending=False, charset='latin', returns=OrderedDict([("word", VARCHAR(15)), ("count_input", VARCHAR(2))]) )
- Set the search path to the database where the file is installed.
>>> get_context().execute("SET SESSION SEARCHUIFDBPATH = <database name>;")
- Execute the script on Vantage.
>>> sto.execute_script() ############ STDOUT Output ############ word count_input 0 Macdonald 1 1 A 1 2 Farm 1 3 Had 1 4 Old 1 5 1 1
- Run the set_data() to reset data and some related parameter, to run the script with a different dataset.
- Create a new DataFrame.
>>> barrierdf_new = DataFrame.from_table("barrier_new")
- Set data and related arguments with different values.
>>> sto.set_data(data=barrierdf_new, data_order_column='Id', is_local_order=True, nulls_first=True)
- Run the script again.
>>> sto.execute_script() word count_input 0 Macdonald 1 1 A 1 2 Farm 1 3 2 1 4 his 1 5 farm 1 6 On 1 7 Had 1 8 Old 1 9 1 1
- Create a new DataFrame.
Example 3
In this example, user run the script again with same dataset but different data related arguments by using set_data() to reset arguments.
All data related arguments that are not specified in set_data() will be reset to default values.
- Set related arguments with different values, but use same data.
>>> sto.set_data(data=barrierdf, data_order_column='Id', is_local_order=True, nulls_first=True)
- Set the search path to the database where the file is installed.
>>> get_context().execute("SET SESSION SEARCHUIFDBPATH = <database name>;")
- Run the script again.
>>> sto.execute_script() ############ STDOUT Output ############ word count_input 0 Macdonald 1 1 A 1 2 Farm 1 3 Had 1 4 Old 1 5 1 1