Use the set_data function to set data and data related arguments without having to re-create APPLY object.
Required Argument
- data
- Specifies a teradataml DataFrame containing the input data.
Optional Arguments
- data_partition_column
- Specifies Partition By columns for data. Values to this argument can be provided as a list, if multiple columns are used for partition.
- data_hash_column
- Specifies the column to be used for hashing. The rows in the input data are redistributed to AMPs based on the hash value of the column specified.
- data_order_column
- Specifies the Order By column for data. Values to this argument can be provided as a list, if multiple columns are used for ordering.
- is_local_order
- Specifies a boolean value to determine whether the input data is to be ordered locally:
- Order by: Specifies the order in which the values in a group or partition are sorted.
- Local Order By: Specifies orders qualified rows on each AMP in preparation to be input to a table function.
- sort_ascending
- Specifies a boolean value to determine if the result set is to be sorted on the column specified in data_order_column, in ascending or descending order.
- nulls_first
- Specifies a boolean value to determine whether NULLS are listed first or last during ordering.
Example
In this example, the script mapper.py reads in a line of text input ("Old Macdonald Had A Farm") from a csv file, and splits the line into individual words, emitting a new row for each word.
- Load example data.
>>> load_example_data("Script", ["barrier", "barrier_new"])
- Create teradataml DataFrame objects.
>>> barrierdf = DataFrame.from_table("barrier")
>>> barrierdf
Name Id 1 Old Macdonald Had A Farm
- List base environments.
>>> from teradataml import list_base_envs, create_env
>>> list_base_envs()
base_name language version 0 python_3.7.13 Python 3.7.13 1 python_3.8.13 Python 3.8.13 2 python_3.9.13 Python 3.9.13
- Create an environment.
>>> demo_env = create_env(env_name = 'demo_env', base_env = 'python_3.8.13', desc = 'Demo Environment')
User environment 'demo_env' created.
>>> import teradataml
>>> from teradatasqlalchemy import VARCHAR
>>> td_path = os.path.dirname(teradataml.__file__)
- Create an APPLY object with data and its arguments.
>>> apply_obj = Apply(data = barrierdf, script_name='mapper.py', files_local_path= os.path.join(td_path,'data', 'scripts'), apply_command='python3 mapper.py', data_order_column="Id", is_local_order=False, nulls_first=False, sort_ascending=False, returns={"word": VARCHAR(15), "count_input": VARCHAR(10)}, env_name=demo_env, delimiter='\t')
- Install file in environment.
>>> apply_obj.install_file('mapper.py')
File 'mapper.py' installed successfully in the remote user environment 'demo_env'.
- Run the user script.
>>> apply_obj.execute_script()
word count_input 0 Macdonald 1 1 A 1 2 Farm 1 3 Had 1 4 Old 1 5 1 1
- Now run the script on a new DataFrame.
- Create a new DataFrame.
>>> barrierdf_new = DataFrame.from_table("barrier_new")
>>> barrierdf_new
Id Name 1 Old Macdonald Had A Farm 2 On his farm he had a cow
- Set the Apply object data arguments to new values.All data related arguments that are not specified in set_data() are reset to default values.
>>> apply_obj.set_data(data=barrierdf_new, data_order_column='Id', nulls_first = True)
- Run the user script on VantageCloud Lake.
>>> apply_obj.execute_script()
word count_input 0 his 1 1 he 1 2 had 1 3 a 1 4 1 1 5 Old 1 6 Macdonald 1 7 Had 1 8 A 1 9 Farm 1
- Create a new DataFrame.