Teradata Package for Python Function Reference | 17.10 - map_row - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

teradataml.dataframe.dataframe.DataFrame.map_row = map_row(self, user_function, exec_mode='IN-DB', chunk_size=1000, num_rows=1000, **kwargs): DESCRIPTION: Function to apply a user defined function to each row in the teradataml DataFrame, leveraging Vantage's Script Table Operator. PARAMETERS: user_function: Required Argument. Specifies the user defined function to apply to each row in the teradataml DataFrame. Types: function or functools.partial Notes: * This can be either a lambda function, a regular python function, or an object of functools.partial. * The first argument (positional) to the user defined function must be a row in a pandas DataFrame corresponding to the teradataml DataFrame to which it is to be applied. * A non-lambda function can be passed only when the user defined function does not accept any arguments other than the mandatory input - the input row. A user can also use functools.partial and lambda functions for the same, which are especially handy when: * there is a need to pass positional and/or keyword arguments (lambda). * there is a need to pass keyword arguments only (functool.partial). * The return type of the user defined function must be one of the following: * numpy ndarray * For a one-dimensional array, it is expected that it has as many values as the number of expected output columns. * For a two-dimensional array, it is expected that every array contained in the outer array has as many values as the number of expected output columns. * pandas Series This represents a row in the output, and the number of values in it must be the same as the number of expected output columns. * pandas DataFrame It is expected that a pandas DataFrame returned by the "user_function" has the same number of columns as the number of expected output columns. * The return objects will be printed to the standard output as required by Script using the 'quotechar' and 'delimiter' values. * The user function can also print the required output to the standard output in the delimited (and possibly quoted) format instead of returning an object of supported type. exec_mode: Optional Argument. Specifies the mode of execution for the user defined function. Permitted values: * IN-DB: Execute the function on data in the teradataml DataFrame in Vantage. * LOCAL: Execute the function locally on sample data (at most "num_rows" rows) from the teradataml DataFrame. * SANDBOX: Execute the function locally within a sandbox environment on sample data (at most "num_rows" rows) from the teradataml DataFrame. Default value: 'IN-DB' Types: str chunk_size: Optional Argument. Specifies the number of rows to be read in a chunk in each iteration using an iterator to apply the user defined function to each row in the chunk. Varying the value passed to this argument affects the performance and the memory utilization. Default value: 1000 Types: int num_rows: Optional Argument. Specifies the maximum number of sample rows to use from the teradataml DataFrame to apply the user defined function to when "exec_mode" is 'LOCAL' or 'SANDBOX'. Default value: 1000 Types: int returns: Optional Argument. Specifies the output column definition corresponding to the output of "user_function". When not specified, the function assumes that the names and types of the output columns are same as that of the input. Types: Dictionary specifying column name to teradatasqlalchemy type mapping. delimiter: Optional Argument. Specifies a delimiter to use when reading columns from a row and writing result columns. Default value: ' ' Types: str with one character Notes: * This argument cannot be same as "quotechar" argument. * This argument cannot be a newline character i.e., ' '. quotechar: Optional Argument. Specifies a character that forces all input and output of the user function to be quoted using this specified character. Using this argument enables the Advanced SQL Engine to distinguish between NULL fields and empty strings. A string with length zero is quoted, while NULL fields are not. If this character is found in the data, it will be escaped by a second quote character. Types: str with one character Notes: * This argument cannot be same as "delimiter" argument. * This argument cannot be a newline character i.e., ' '. auth: Optional Argument. Specifies an authorization to use when running the "user_function". Types: str charset: Optional Argument. Specifies the character encoding for data. Permitted values: 'utf-16', 'latin' Types: str data_order_column: Optional Argument. Specifies the Order By columns for the teradataml DataFrame. Values to this argument can be provided as a list, if multiple columns are used for ordering. This argument is used in both cases: "is_local_order = True" and "is_local_order = False". Types: str OR list of Strings (str) Note: "is_local_order" must be set to 'True' when "data_order_column" is used with "data_hash_column". is_local_order: Optional Argument. Specifies a boolean value to determine whether the input data is to be ordered locally or not. "data_order_column" with "is_local_order" set to 'False' specifies the order in which the values in a group, or partition, are sorted. When this argument is set to 'True', qualified rows on each AMP are ordered in preparation to be input to a table function. This argument is ignored, if "data_order_column" is None. Default value: False Types: bool Notes: * "is_local_order" cannot be specified along with "data_partition_column". * When "is_local_order" is set to True, "data_order_column" should be specified, and the columns specified in "data_order_column" are used for local ordering. nulls_first: Optional Argument. Specifies a boolean value to determine whether NULLS are listed first or last during ordering. This argument is ignored, if "data_order_column" is None. NULLS are listed first when this argument is set to 'True', and last when set to 'False'. Default value: True Types: bool sort_ascending: Optional Argument. Specifies a boolean value to determine if the result set is to be sorted on the "data_order_column" column in ascending or descending order. The sorting is ascending when this argument is set to 'True', and descending when set to 'False'. This argument is ignored, if "data_order_column" is None. Default Value: True Types: bool RETURNS: 1. teradataml DataFrame if exec_mode is "IN-DB". 2. Pandas DataFrame if exec_mode is "LOCAL". RAISES: TypeError, TeradataMlException. EXAMPLES: >>> # This example uses the 'admissions_train' dataset, to increase >>> # the 'gpa' by a give percentage. >>> # Load the example data. >>> load_example_data("dataframe", "admissions_train") >>> df = DataFrame('admissions_train') >>> print(df) masters gpa stats programming admitted id 22 yes 3.46 Novice Beginner 0 36 no 3.00 Advanced Novice 0 15 yes 4.00 Advanced Advanced 1 38 yes 2.65 Advanced Beginner 1 5 no 3.44 Novice Novice 0 17 no 3.83 Advanced Advanced 1 34 yes 3.85 Advanced Beginner 0 13 no 4.00 Advanced Novice 1 26 yes 3.57 Advanced Advanced 1 19 yes 1.98 Advanced Advanced 0 >>> # Example 1: >>> # Create the user defined function to increase the 'gpa' by the >>> # percentage provided. Note that the input to and the output >>> # from the function is a pandas Series object. >>> def increase_gpa(row, p=20): ... row['gpa'] = row['gpa'] + row['gpa'] * p/100 ... return row ... >>> >>> # Apply the user defined function to the DataFrame. >>> # Note that since the output of the user defined function >>> # expects the same columns with the same types, we can skip >>> # passing the 'returns' argument. >>> increase_gpa_20 = df.map_row(increase_gpa) >>> >>> # Print the result. >>> print(increase_gpa_20) masters gpa stats programming admitted id 22 yes 4.152 Novice Beginner 0 36 no 3.600 Advanced Novice 0 15 yes 4.800 Advanced Advanced 1 38 yes 3.180 Advanced Beginner 1 5 no 4.128 Novice Novice 0 17 no 4.596 Advanced Advanced 1 34 yes 4.620 Advanced Beginner 0 13 no 4.800 Advanced Novice 1 26 yes 4.284 Advanced Advanced 1 19 yes 2.376 Advanced Advanced 0 >>> # Example 2: >>> # Use the same user defined function with a lambda notation to >>> # pass the percentage, 'p = 40'. >>> increase_gpa_40 = df.map_row(lambda row: increase_gpa(row, ... p = 40)) >>> >>> print(increase_gpa_40) masters gpa stats programming admitted id 22 yes 4.844 Novice Beginner 0 36 no 4.200 Advanced Novice 0 15 yes 5.600 Advanced Advanced 1 38 yes 3.710 Advanced Beginner 1 5 no 4.816 Novice Novice 0 17 no 5.362 Advanced Advanced 1 34 yes 5.390 Advanced Beginner 0 13 no 5.600 Advanced Novice 1 26 yes 4.998 Advanced Advanced 1 19 yes 2.772 Advanced Advanced 0 >>> # Example 3: >>> # Use the same user defined function with functools.partial to >>> # pass the percentage, 'p = 50'. >>> from functools import partial >>> increase_gpa_50 = df.map_row(partial(increase_gpa, p = 50)) >>> >>> print(increase_gpa_50) masters gpa stats programming admitted id 13 no 6.000 Advanced Novice 1 26 yes 5.355 Advanced Advanced 1 5 no 5.160 Novice Novice 0 19 yes 2.970 Advanced Advanced 0 15 yes 6.000 Advanced Advanced 1 40 yes 5.925 Novice Beginner 0 7 yes 3.495 Novice Novice 1 22 yes 5.190 Novice Beginner 0 36 no 4.500 Advanced Novice 0 38 yes 3.975 Advanced Beginner 1 >>> # Example 4: >>> # Use a lambda function to increase the 'gpa' by 50 percent, and >>> # return numpy ndarray. >>> from numpy import asarray >>> inc_gpa_lambda = lambda row, p=20: asarray([row['id'], ... row['masters'], ... row['gpa'] + row['gpa'] * p/100, ... row['stats'], ... row['programming'], ... row['admitted']]) >>> increase_gpa_100 = df.map_row(lambda row: inc_gpa_lambda(row, ... p=100)) >>> >>> print(increase_gpa_100) masters gpa stats programming admitted id 13 no 8.00 Advanced Novice 1 26 yes 7.14 Advanced Advanced 1 5 no 6.88 Novice Novice 0 19 yes 3.96 Advanced Advanced 0 15 yes 8.00 Advanced Advanced 1 40 yes 7.90 Novice Beginner 0 7 yes 4.66 Novice Novice 1 22 yes 6.92 Novice Beginner 0 36 no 6.00 Advanced Novice 0 38 yes 5.30 Advanced Beginner 1