Teradata Package for Python Function Reference | 17.10 - set_data - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

teradataml.table_operators.Script.set_data = set_data(self, data, data_partition_column=None, data_hash_column=None, data_order_column=None, is_local_order=False, sort_ascending=True, nulls_first=True): DESCRIPTION: Function enables user to set data and data related arguments without having to re-create Script object. PARAMETERS: data: Required Argument. Specifies a teradataml DataFrame containing the input data for the script. data_hash_column: Optional Argument. Specifies the column to be used for hashing. The rows in the data are redistributed to AMPs based on the hash value of the column specified. The user installed script then runs once on each AMP. If there is no data_partition_column, then the entire result set delivered by the function, constitutes a single group or partition. Types: str Note: "data_hash_column" can not be specified along with "data_partition_column", "is_local_order" and "data_order_column". data_partition_column: Optional Argument. Specifies Partition By columns for data. Values to this argument can be provided as a list, if multiple columns are used for partition. Default Value: ANY Types: str OR list of Strings (str) Note: 1) "data_partition_column" can not be specified along with "data_hash_column". 2) "data_partition_column" can not be specified along with "is_local_order = True". is_local_order: Optional Argument. Specifies a boolean value to determine whether the input data is to be ordered locally or not. Order by specifies the order in which the values in a group or partition are sorted. Local Order By specifies orders qualified rows on each AMP in preparation to be input to a table function. This argument is ignored, if "data_order_column" is None. When set to True, data is ordered locally. Default Value: False Types: bool Note: 1) "is_local_order" can not be specified along with "data_hash_column". 2) When "is_local_order" is set to True, "data_order_column" should be specified, and the columns specified in "data_order_column" are used for local ordering. data_order_column: Optional Argument. Specifies Order By columns for data. Values to this argument can be provided as a list, if multiple columns are used for ordering. This argument is used in both cases: "is_local_order = True" and "is_local_order = False". Types: str OR list of Strings (str) Note: "data_order_column" can not be specified along with "data_hash_column". sort_ascending: Optional Argument. Specifies a boolean value to determine if the result set is to be sorted on the column specified in "data_order_column", in ascending or descending order. The sorting is ascending when this argument is set to True, and descending when set to False. This argument is ignored, if "data_order_column" is None. Default Value: True Types: bool nulls_first: Optional Argument. Specifies a boolean value to determine whether NULLS are listed first or last during ordering. This argument is ignored, if "data_order_column" is None. NULLS are listed first when this argument is set to True, and last when set to False. Default Value: True Types: bool RETURNS: None. RAISES: TeradataMlException EXAMPLES: # Note - Refer to User Guide for setting search path and required permissions. # Load example data. load_example_data("Script", ["barrier"]) # Example 1 # Create teradataml DataFrame objects. >>> barrierdf = DataFrame.from_table("barrier") >>> barrierdf Name Id 1 Old Macdonald Had A Farm >>> # Set SEARCHUIFDBPATH >>> get_context().execute("SET SESSION SEARCHUIFDBPATH = alice;") >>> import teradataml >>> from teradatasqlalchemy import VARCHAR >>> td_path = os.path.dirname(teradataml.__file__) # The script mapper.py reads in a line of text input # ("Old Macdonald Had A Farm") from csv and # splits the line into individual words, emitting a new row for each word. # Create a Script object without data and its arguments. >>> sto = Script(data = barrierdf, ... script_name='mapper.py', ... files_local_path= os.path.join(td_path,'data', 'scripts'), ... script_command='python ./alice/mapper.py', ... charset='latin', ... returns=OrderedDict([("word", VARCHAR(15)),("count_input", VARCHAR(2))])) # Test script using data from file >>> sto.test_script(input_data_file='../barrier.csv', data_file_delimiter=',') ############ STDOUT Output ############ word count_input 0 1 1 1 Old 1 2 Macdonald 1 3 Had 1 4 A 1 5 Farm 1 >>> # Test script using data from DB. >>> sto.test_script(password='alice') ############ STDOUT Output ############ word count_input 0 1 1 1 Old 1 2 Macdonald 1 3 Had 1 4 A 1 5 Farm 1 # Test script using data from DB and with data_row_limit. >>> sto.test_script(password='alice', data_row_limit=5) ############ STDOUT Output ############ word count_input 0 1 1 1 Old 1 2 Macdonald 1 3 Had 1 4 A 1 5 Farm 1 # Now in order to test / run script on actual data on Vantage user must # set data and related arguments. # Note: # All data related arguments that are not specified in set_data() are # reset to default values. >>> sto.set_data(data=barrierdf, ... data_order_column="Id", ... is_local_order=False, ... nulls_first=False, ... sort_ascending=False) # Execute the user script on Vantage. >>> sto.execute_script() ############ STDOUT Output ############ word count_input 0 Macdonald 1 1 A 1 2 Farm 1 3 Had 1 4 Old 1 5 1 1 # Example 2 - # Script is tested using test_script and executed on Vantage. # use set_data() to reset arguments. # Create teradataml DataFrame objects. >>> load_example_data("Script", ["barrier_new"]) >>> barrierdf_new = DataFrame.from_table("barrier_new") >>> barrierdf_new Name Id 2 On his farm he had a cow 1 Old Macdonald Had A Farm >>> # Create a Script object that allows us to execute script on Vantage. >>> sto = Script(data=barrierdf_new, ... script_name='mapper.py', ... files_local_path= os.path.join(td_path, 'data', 'scripts'), ... script_command='python ./alice/mapper.py', ... data_order_column="Id", ... is_local_order=False, ... nulls_first=False, ... sort_ascending=False, ... charset='latin', ... returns=OrderedDict([("word", VARCHAR(15)),("count_input", VARCHAR(2))])) # Script is tested using test_script and executed on Vantage. >>> sto.execute_script() ############ STDOUT Output ############ word count_input 0 his 1 1 he 1 2 had 1 3 a 1 4 1 1 5 Old 1 6 cow 1 7 farm 1 8 On 1 9 2 1 # Now in order to run the script with a different dataset, # user can use set_data(). # Re-set data and some data related parameters. # Note: # All data related arguments that are not specified in set_data() are # reset to default values. >>> sto.set_data(data=barrierdf, ... data_order_column='Id', ... is_local_order=True, ... nulls_first=True) >>> sto.execute_script() word count_input 0 Macdonald 1 1 A 1 2 Farm 1 3 Had 1 4 Old 1 5 1 1 # Example 3 # Script is tested using test_script and executed on Vantage. # In order to run the script with same dataset but different data related # arguments, use set_data() to reset arguments. # Note: # All data related arguments that are not specified in set_data() are # reset to default values. >>> sto.set_data(data=barrierdf_new, ... data_order_column='Id', ... is_local_order = True, ... nulls_first = True) >>> sto.execute_script() ############ STDOUT Output ############ word count_input 0 Macdonald 1 1 A 1 2 Farm 1 3 2 1 4 his 1 5 farm 1 6 On 1 7 Had 1 8 Old 1 9 1 1