Use the map_row() method to apply a function to every row in a teradataml DataFrame and return a teradataml DataFrame.
- user_function: Specifies the user defined function to apply to each row in the teradataml DataFrame.
This can be either a lambda function, a regular Python function, or an object of functools.partial.
A non-lambda function can be passed only when the user defined function does not accept any arguments other than the mandatory input - the input row.
A user can also use functools.partial and lambda functions for the same, when:- For lambda function, there is a need to pass positional, or keyword, or both arguments.
- For functools.partial, there is a need to pass keyword arguments only.
See the "Functions, Inputs and Outputs" section in map_row() and map_partition() Methods for details about the input and output of this argument.
- exec_mode: Specifies the mode of execution for the user defined function. Permitted values:
- IN-DB: Execute the function on data in the teradataml DataFrame in Vantage.
This is the default value.
- LOCAL: Execute the function locally on sample data (at most num_rows rows) from the teradataml DataFrame.
- IN-DB: Execute the function on data in the teradataml DataFrame in Vantage.
- chunk_size: Specifies the number of rows to be read in a chunk in each iteration using an iterator to apply the user defined function to each row in the chunk.
Varying the value passed to this argument affects the performance and the memory utilization. Default value is 1000.
- num_rows: Specifies the maximum number of sample rows to use from the teradataml DataFrame to apply the user defined function to when exec_mode is 'LOCAL'.
The method also accepts the same arguments that Script accepts, except that with returns is optional and the method does not accept data, data_hash_column, and data_partition_column. When returns is not provided, the method assumes that the function's output has the columns with the same names and types as the input teradataml DataFrame.
Example Prerequisite
The examples use the 'admissions_train' dataset, calculates the average 'gpa' per partition based on the value in 'admitted' column.
- Load the example data.
>>> load_example_data("dataframe", "admissions_train")
- Create a DataFrame.
>>> df = DataFrame('admissions_train')
>>> print(df) masters gpa stats programming admitted id 5 no 3.44 Novice Novice 0 34 yes 3.85 Advanced Beginner 0 13 no 4.00 Advanced Novice 1 40 yes 3.95 Novice Beginner 0 22 yes 3.46 Novice Beginner 0 19 yes 1.98 Advanced Advanced 0 36 no 3.00 Advanced Novice 0 15 yes 4.00 Advanced Advanced 1 7 yes 2.33 Novice Novice 1 17 no 3.83 Advanced Advanced 1
Example 1: Create a user defined function to increase the 'gpa' by the percentage provided
In this example, the input to and the output from the function is a Pandas Series object.
- Create a user defined function.
>>> def increase_gpa(row, p=20): row['gpa'] = row['gpa'] + row['gpa'] * p/100 return row
- Apply the user defined function to the DataFrame.
Since the output of the user defined function expects the same columns with the same types, you can skip passing the returns argument.
>>> increase_gpa_20 = df.map_row(increase_gpa)
- Print the result.
>>> print(increase_gpa_20) masters gpa stats programming admitted id 13 no 4.800 Advanced Novice 1 36 no 3.600 Advanced Novice 0 15 yes 4.800 Advanced Advanced 1 40 yes 4.740 Novice Beginner 0 22 yes 4.152 Novice Beginner 0 38 yes 3.180 Advanced Beginner 1 26 yes 4.284 Advanced Advanced 1 5 no 4.128 Novice Novice 0 7 yes 2.796 Novice Novice 1 19 yes 2.376 Advanced Advanced 0
Example 2: Use the same user defined function with a lambda notation to pass the percentage 'p = 40'
- Apply the user defined function to the DataFrame with a lambda notation.
>>> increase_gpa_40 = df.map_row(lambda row: increase_gpa(row, p = 40))
- Print the result.
>>> print(increase_gpa_40) masters gpa stats programming admitted id 5 no 4.816 Novice Novice 0 34 yes 5.390 Advanced Beginner 0 13 no 5.600 Advanced Novice 1 40 yes 5.530 Novice Beginner 0 22 yes 4.844 Novice Beginner 0 19 yes 2.772 Advanced Advanced 0 36 no 4.200 Advanced Novice 0 15 yes 5.600 Advanced Advanced 1 7 yes 3.262 Novice Novice 1 17 no 5.362 Advanced Advanced 1
Example 3: Use the same user defined function with functools.partial to pass the percentage 'p = 50'
- Load the necessary module.
>>> from functools import partial
- Apply the user defined function to the DataFrame with functools.partial.
>>> increase_gpa_50 = df.map_row(partial(increase_gpa, p = 50))
- Print the result.
>>> print(increase_gpa_50) masters gpa stats programming admitted id 5 no 5.160 Novice Novice 0 34 yes 5.775 Advanced Beginner 0 13 no 6.000 Advanced Novice 1 40 yes 5.925 Novice Beginner 0 22 yes 5.190 Novice Beginner 0 19 yes 2.970 Advanced Advanced 0 36 no 4.500 Advanced Novice 0 15 yes 6.000 Advanced Advanced 1 7 yes 3.495 Novice Novice 1 17 no 5.745 Advanced Advanced 1
Example 4: Use a lambda function to increase the 'gpa' by 100 percent, and return numpy ndarray
- Load the necessary module.
>>> from numpy import asarray
- Create a lambda function.
>>> increase_gpa_lambda = lambda row, p=20: asarray([row['id'], row['masters'], row['gpa'] + row['gpa'] * p/100, row['stats'], row['programming'], row['admitted']]
- Apply the lambda function to the DataFrame.
>>> increase_gpa_100 = df.map_row(lambda row: increase_gpa_lambda(row, p=100))
- Print the result.
>>> print(increase_gpa_100) masters gpa stats programming admitted id 5 no 6.88 Novice Novice 0 34 yes 7.70 Advanced Beginner 0 13 no 8.00 Advanced Novice 1 40 yes 7.90 Novice Beginner 0 22 yes 6.92 Novice Beginner 0 19 yes 3.96 Advanced Advanced 0 36 no 6.00 Advanced Novice 0 15 yes 8.00 Advanced Advanced 1 7 yes 4.66 Novice Novice 1 17 no 7.66 Advanced Advanced 1