map_row() Method | Teradata Python Package - map_row() Method - Teradata Package for Python

Teradata® Package for Python User Guide

Product
Teradata Package for Python
Release Number
17.00
Published
November 2021
Language
English (United States)
Last Update
2022-01-14
dita:mapPath
bol1585763678431.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage

Use the map_row() method to apply a function to every row in a teradataml DataFrame and return a teradataml DataFrame.

Required arguments:
  • user_function: Specifies the user defined function to apply to each row in the teradataml DataFrame.

    This can be either a lambda function, a regular Python function, or an object of functools.partial.

    A non-lambda function can be passed only when the user defined function does not accept any arguments other than the mandatory input - the input row.

    A user can also use functools.partial and lambda functions for the same, when:
    • For lambda function, there is a need to pass positional, or keyword, or both arguments.
    • For functools.partial, there is a need to pass keyword arguments only.

    See the "Functions, Inputs and Outputs" section in map_row() and map_partition() Methods for details about the input and output of this argument.

Optional arguments:
  • exec_mode: Specifies the mode of execution for the user defined function.
    Permitted values:
    • IN-DB: Execute the function on data in the teradataml DataFrame in Vantage.

      This is the default value.

    • LOCAL: Execute the function locally on sample data (at most num_rows rows) from the teradataml DataFrame.
  • chunk_size: Specifies the number of rows to be read in a chunk in each iteration using an iterator to apply the user defined function to each row in the chunk.

    Varying the value passed to this argument affects the performance and the memory utilization. Default value is 1000.

  • num_rows: Specifies the maximum number of sample rows to use from the teradataml DataFrame to apply the user defined function to when exec_mode is 'LOCAL'.

The method also accepts the same arguments that Script accepts, except that with returns is optional and the method does not accept data, data_hash_column, and data_partition_column. When returns is not provided, the method assumes that the function's output has the columns with the same names and types as the input teradataml DataFrame.

Example Prerequisite

The examples use the 'admissions_train' dataset, calculates the average 'gpa' per partition based on the value in 'admitted' column.

  • Load the example data.
    >>> load_example_data("dataframe", "admissions_train")
  • Create a DataFrame.
    >>> df = DataFrame('admissions_train')
    >>> print(df)
       masters   gpa     stats programming  admitted
    id                                             
    5       no  3.44    Novice      Novice         0
    34     yes  3.85  Advanced    Beginner         0
    13      no  4.00  Advanced      Novice         1
    40     yes  3.95    Novice    Beginner         0
    22     yes  3.46    Novice    Beginner         0
    19     yes  1.98  Advanced    Advanced         0
    36      no  3.00  Advanced      Novice         0
    15     yes  4.00  Advanced    Advanced         1
    7      yes  2.33    Novice      Novice         1
    17      no  3.83  Advanced    Advanced         1

Example 1: Create a user defined function to increase the 'gpa' by the percentage provided

In this example, the input to and the output from the function is a Pandas Series object.

  1. Create a user defined function.
    >>> def increase_gpa(row, p=20):
            row['gpa'] = row['gpa'] + row['gpa'] * p/100
            return row
  2. Apply the user defined function to the DataFrame.

    Since the output of the user defined function expects the same columns with the same types, you can skip passing the returns argument.

    >>> increase_gpa_20 = df.map_row(increase_gpa)
  3. Print the result.
    >>> print(increase_gpa_20)
       masters    gpa     stats programming  admitted
    id                                             
    13      no  4.800  Advanced      Novice         1
    36      no  3.600  Advanced      Novice         0
    15     yes  4.800  Advanced    Advanced         1
    40     yes  4.740    Novice    Beginner         0
    22     yes  4.152    Novice    Beginner         0
    38     yes  3.180  Advanced    Beginner         1
    26     yes  4.284  Advanced    Advanced         1
    5       no  4.128    Novice      Novice         0
    7      yes  2.796    Novice      Novice         1
    19     yes  2.376  Advanced    Advanced         0

Example 2: Use the same user defined function with a lambda notation to pass the percentage 'p = 40'

  1. Apply the user defined function to the DataFrame with a lambda notation.
    >>> increase_gpa_40 = df.map_row(lambda row: increase_gpa(row, p = 40))
  2. Print the result.
    >>> print(increase_gpa_40)
       masters    gpa     stats programming  admitted
    id                                              
    5       no  4.816    Novice      Novice         0
    34     yes  5.390  Advanced    Beginner         0
    13      no  5.600  Advanced      Novice         1
    40     yes  5.530    Novice    Beginner         0
    22     yes  4.844    Novice    Beginner         0
    19     yes  2.772  Advanced    Advanced         0
    36      no  4.200  Advanced      Novice         0
    15     yes  5.600  Advanced    Advanced         1
    7      yes  3.262    Novice      Novice         1
    17      no  5.362  Advanced    Advanced         1

Example 3: Use the same user defined function with functools.partial to pass the percentage 'p = 50'

  1. Load the necessary module.
    >>> from functools import partial
  2. Apply the user defined function to the DataFrame with functools.partial.
    >>> increase_gpa_50 = df.map_row(partial(increase_gpa, p = 50))
  3. Print the result.
    >>> print(increase_gpa_50)
       masters    gpa     stats programming  admitted
    id                                              
    5       no  5.160    Novice      Novice         0
    34     yes  5.775  Advanced    Beginner         0
    13      no  6.000  Advanced      Novice         1
    40     yes  5.925    Novice    Beginner         0
    22     yes  5.190    Novice    Beginner         0
    19     yes  2.970  Advanced    Advanced         0
    36      no  4.500  Advanced      Novice         0
    15     yes  6.000  Advanced    Advanced         1
    7      yes  3.495    Novice      Novice         1
    17      no  5.745  Advanced    Advanced         1

Example 4: Use a lambda function to increase the 'gpa' by 100 percent, and return numpy ndarray

  1. Load the necessary module.
    >>> from numpy import asarray
  2. Create a lambda function.
    >>> increase_gpa_lambda = lambda row, p=20: asarray([row['id'], row['masters'], row['gpa'] + row['gpa'] * p/100, row['stats'], row['programming'], row['admitted']]
  3. Apply the lambda function to the DataFrame.
    >>> increase_gpa_100 = df.map_row(lambda row: increase_gpa_lambda(row, p=100))
  4. Print the result.
    >>> print(increase_gpa_100)
       masters   gpa     stats programming  admitted
    id                                             
    5       no  6.88    Novice      Novice         0
    34     yes  7.70  Advanced    Beginner         0
    13      no  8.00  Advanced      Novice         1
    40     yes  7.90    Novice    Beginner         0
    22     yes  6.92    Novice    Beginner         0
    19     yes  3.96  Advanced    Advanced         0
    36      no  6.00  Advanced      Novice         0
    15     yes  8.00  Advanced    Advanced         1
    7      yes  4.66    Novice      Novice         1
    17      no  7.66  Advanced    Advanced         1