Teradata Package for Python Function Reference on VantageCloud Lake - __init__ - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference on VantageCloud Lake
- Deployment
- VantageCloud
- Edition
- Lake
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.04
- Published
- March 2025
- ft:locale
- en-US
- ft:lastEdition
- 2025-04-11
- dita:id
- TeradataPython_FxRef_Lake_2000
- Product Category
- Teradata Vantage
- teradataml.table_operators.Apply.__init__ = __init__(self, data=None, script_name=None, files_local_path=None, apply_command=None, delimiter=',', returns=None, quotechar=None, env_name=None, style='csv', data_partition_column=None, data_hash_column=None, data_order_column=None, is_local_order=False, sort_ascending=True, nulls_first=True)
- DESCRIPTION:
The fastpath Apply table operator executes a user-installed script or
any Linux command inside the remote user environment using Open Analytics Framework.
The installed script will be executed in parallel with data from Advanced SQL Engine.
PARAMETERS:
apply_command:
Required Argument.
Specifies the command/script to run.
Note:
* 'Rscript --vanilla ..' helps user to run R script without saving or restoring anything in
the process and keep things clean.
Types: str
script_name:
Required Argument.
Specifies the name of the user script.
Types: str
files_local_path:
Required Argument.
Specifies the absolute local path where user script and all supporting files
like model files, input data file reside.
Types: str
env_name:
Required Argument.
Specifies the name of the remote user environment or an object of class UserEnv.
Types: str or oject of class UserEnv.
returns:
Optional Argument.
Specifies the output column definition.
Data argument is required when "returns" is not specified.
When "returns" is not specified, output column definition should match
with column definition of table specified in the data argument.
Types: Dictionary specifying column name to teradatasqlalchemy type mapping.
Default: None
data:
Optional Argument.
Specifies a teradataml DataFrame containing the input data for the script.
data_hash_column:
Optional Argument.
Specifies the column to be used for hashing.
The rows in the input data are redistributed to AMPs based on the hash value of the
column specified.
If there is no "data_hash_column", then the entire result set,
delivered by the function, constitutes a single group or partition.
Types: str
Notes:
1. "data_hash_column" can not be specified along with "data_partition_column".
2. "data_hash_column" can not be specified along with "is_local_order=False" and
"data_order_column".
data_partition_column:
Optional Argument.
Specifies Partition By columns for data.
Values to this argument can be provided as a list, if multiple
columns are used for partition. If there is no "data_partition_column",
then the entire result set delivered by the function, constitutes a single
group or partition.
Default Value: ANY
Types: str OR list of Strings (str)
Notes:
1) "data_partition_column" can not be specified along with "data_hash_column".
2) "data_partition_column" can not be specified along with "is_local_order = True".
is_local_order:
Optional Argument.
Specifies a boolean value to determine whether the input data is to be ordered locally
or not. 'sort_ascending' specifies the order in which the values in a group, or partition,
are sorted. This argument is ignored, if data_order_column is None.
When set to 'True', qualified rows are ordered locally in preparation to be input
to the function.
Default Value: False
Types: bool
Note:
When "is_local_order" is set to 'True', "data_order_column" should be
specified, and the columns specified in "data_order_column"
are used for local ordering.
data_order_column:
Optional Argument.
Specifies Order By columns for data.
Values to this argument can be provided as a list, if multiple
columns are used for ordering.
This argument is used with in both cases: "is_local_order = True"
and "is_local_order = False".
Types: str OR list of Strings (str)
Note:
"data_order_column" can not be specified along with "data_hash_column".
sort_ascending:
Optional Argument.
Specifies a boolean value to determine if the input data is to be sorted on
the data_order_column column in ascending or descending order.
When this is set to 'True' data is sorted in ascending order,
otherwise data is sorted in descending order.
This argument is ignored, if data_order_column is None.
Default Value: True
Types: bool
nulls_first:
Optional Argument.
Specifies a boolean value to determine whether NULLS from input data are listed
first or last during ordering.
When this is set to 'True' NULLS are listed first, otherwise NULLS are listed last.
This argument is ignored, if data_order_column is None.
Default Value: True
Types: bool
delimiter:
Optional Argument.
Specifies a delimiter to use when reading columns from a row and
writing result columns. Delimiter must be a valid Unicode code point.
Notes:
1) The Quotechar cannot be the same as the Delimiter.
2) The value of delimiter cannot be an empty string, newline and carriage return.
Default value: comma (,)
Types: str
quotechar:
Optional Argument.
Specifies the character used to quote all input and output values for the script.
Note: The Quotechar cannot be the same as the Delimiter.
Default value: double quote (")
Types: str
style:
Optional Argument.
Specifies how input is passed to and output is generated by the 'apply_command'
respectively.
Note:
This clause only supports 'csv' value for Apply.
Default value: "csv"
Types: str
RETURNS:
Apply Object
RAISES:
TeradataMlException
EXAMPLES:
# Note - Refer to User Guide for setting required permissions.
# Load example data.
>>> load_example_data("Script", ["barrier"])
# Example 1 - The Python script mapper.py reads in a line of text input ("Old Macdonald Had A Farm")
# from csv and splits the line into individual words, emitting a new row for each word.
# Create teradataml DataFrame objects.
>>> barrierdf = DataFrame.from_table("barrier")
# Create remote user environment.
>>> testenv = create_env('testenv', 'python_3.7.13', 'Demo environment')
User environment testenv created.
>>> import os, teradataml
>>> teradataml_dir = os.path.dirname(teradataml.__file__)
# Create an Apply object that allows us to execute script.
>>> apply_obj = Apply(data=barrierdf,
script_name='mapper.py',
files_local_path= os.path.join(teradataml_dir, 'data', 'scripts'),
apply_command='python3 mapper.py',
data_order_column="Id",
is_local_order=False,
nulls_first=False,
sort_ascending=False,
returns={"word": VARCHAR(15), "count_input": VARCHAR(10)},
env_name=testenv,
delimiter=' ')
# Run user script locally using data from csv.
# This helps the user to fix script level issues outside Open Analytics
# Framework.
>>> apply_obj.test_script(input_data_file=os.path.join(teradataml_dir, 'data', 'barrier.csv'))
############ STDOUT Output ############
word count_input
0 Macdonald 1
1 A 1
2 Farm 1
3 Had 1
4 Old 1
5 1 1
# Install file in remote user environment.
>>> apply_obj.install_file(file_name=os.path.join(teradataml_dir, 'data', 'mapper.py'))
File 'mapper.py' installed successfully in the remote user environment 'testenv'.
# Execute the user script in the Open Analytics Framework.
>>> apply_obj.execute_script()
word count_input
0 Macdonald 1
1 A 1
2 Farm 1
3 Had 1
4 Old 1
5 1 1
# Remove the installed file from remote user environment.
>>> apply_obj.remove_file(file_name='mapper.py')
File 'mapper.py' removed successfully from the remote user environment 'testenv'.
# Example 2 - The R script mapper.R reads in a line of text input ("Old Macdonald Had A Farm")
# from csv and splits the line into individual words, emitting a new row for each word.
# Create teradataml DataFrame object.
>>> barrierdf = DataFrame.from_table("barrier")
# Create remote user environment.
>>> testenv = create_env('test_env_for_r', 'r_4.1', 'Demo environment')
User environment test_env_for_r created.
>>> import os, teradataml
# Install file in remote user environment.
>>> testenv.install_file(file_path=os.path.join(os.path.dirname(teradataml.__file__), "data", "scripts", "mapper.R"))
File 'mapper.R' installed successfully in the remote user environment 'test_env_for_r'.
# Create an Apply object that allows us to execute script.
>>> apply_obj = Apply(data=barrierdf,
apply_command='Rscript --vanilla mapper.R',
data_order_column="Id",
is_local_order=False,
nulls_first=False,
sort_ascending=False,
returns={"word": VARCHAR(15), "count_input": VARCHAR(10)},
env_name=testenv,
delimiter=' ')
# Execute the user script in the Open Analytics Framework.
>>> apply_obj.execute_script()
word count_input
0 Macdonald 1
1 A 1
2 Farm 1
3 Had 1
4 Old 1
5 1 1
# Remove the installed file from remote user environment.
>>> apply_obj.remove_file(file_name='mapper.R')
File 'mapper.R' removed successfully from the remote user environment 'test_env_for_r'.