Examples: How to use DataFrame.apply() | Teradata Package for Python - Examples: How to use DataFrame.apply() - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
ft:locale
en-US
ft:lastEdition
2025-01-23
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

Example setup

Create a Python 3.10 environment with given name and description in Analytics Database. This example uses the 'admissions_train' dataset, to increase the 'gpa' by a give percentage.

>>> env = create_env('testenv', 'python_3.10', 'Test environment')
User environment testenv created.

# Packages dill and pandas must be installed in remote user environment.
>>> env.install_lib(['pandas','dill'])
Request to install libraries initiated successfully in the remote user environment demo_env. Check the status using status() with the claim id 'ef255030-1be2-4d4a-9d47-12cd4365a003'.

# Check the status of installation.
>>> env.status('ef255030-1be2-4d4a-9d47-12cd4365a003')
                               Claim Id     File/Libs  Method Name     Stage             Timestamp Additional Details
0  ef255030-1be2-4d4a-9d47-12cd4365a003  pandas, dill  install_lib   Started  2022-08-04T04:27:56Z
1  ef255030-1be2-4d4
a-9d47-12cd4365a003  pandas, dill  install_lib  Finished  2022-08-04T04:29:12Z
>>>
# Load the example data.
>>> load_example_data("dataframe", "admissions_train")
>>> df = DataFrame('admissions_train')
>>> print(df)
   masters   gpa     stats programming  admitted
id
22     yes  3.46    Novice    Beginner         0
36      no  3.00  Advanced      Novice         0
15     yes  4.00  Advanced    Advanced         1
38     yes  2.65  Advanced    Beginner         1
5       no  3.44    Novice      Novice         0
17      no  3.83  Advanced    Advanced         1
34     yes  3.85  Advanced    Beginner         0
13      no  4.00  Advanced      Novice         1
26     yes  3.57  Advanced    Advanced         1
19     yes  1.98  Advanced    Advanced         0

Example 1: Create the user defined function to increase the 'gpa' by the # percentage provided

Note that the input to and the output from the function is a Pandas Series object.

>>> def increase_gpa(row, p=20):
...     row['gpa'] = row['gpa'] + row['gpa'] * p/100
...     return row
...
>>>
# Apply the user defined function to the DataFrame.
# Note that since the output of the user defined function
# expects the same columns with the same types, we can skip
# passing the 'returns' argument.
>>> increase_gpa_20 = df.apply(increase_gpa, env_name='testenv')
>>>
>>> # Print the result.
>>> print(increase_gpa_20)
   masters    gpa     stats programming  admitted
id
22     yes  4.152    Novice    Beginner         0
36      no  3.600  Advanced      Novice         0
15     yes  4.800  Advanced    Advanced         1
38     yes  3.180  Advanced    Beginner         1
5       no  4.128    Novice      Novice         0
17      no  4.596  Advanced    Advanced         1
34     yes  4.620  Advanced    Beginner         0
13      no  4.800  Advanced      Novice         1
26     yes  4.284  Advanced    Advanced         1
19     yes  2.376  Advanced    Advanced         0

Example 2: Use the same user defined function with a lambda notation to pass the percentage, 'p = 40'

>>> increase_gpa_40 = df.apply(lambda row: increase_gpa(row,
...                                                     p = 40),
...                            env_name='testenv')
>>>
>>> print(increase_gpa_40)
   masters    gpa     stats programming  admitted
id
22     yes  4.844    Novice    Beginner         0
36      no  4.200  Advanced      Novice         0
15     yes  5.600  Advanced    Advanced         1
38     yes  3.710  Advanced    Beginner         1
5       no  4.816    Novice      Novice         0
17      no  5.362  Advanced    Advanced         1
34     yes  5.390  Advanced    Beginner         0
13      no  5.600  Advanced      Novice         1
26     yes  4.998  Advanced    Advanced         1
19     yes  2.772  Advanced    Advanced         0

Example 3: Use the same user defined function with functools.partial to pass the percentage, 'p = 50'

>>> from functools import partial
>>> increase_gpa_50 = df.apply(partial(increase_gpa, p = 50),
...                            env_name='testenv')
>>>
>>> print(increase_gpa_50)
   masters    gpa     stats programming  admitted
id
13      no  6.000  Advanced      Novice         1
26     yes  5.355  Advanced    Advanced         1
5       no  5.160    Novice      Novice         0
19     yes  2.970  Advanced    Advanced         0
15     yes  6.000  Advanced    Advanced         1
40     yes  5.925    Novice    Beginner         0
7      yes  3.495    Novice      Novice         1
22     yes  5.190    Novice    Beginner         0
36      no  4.500  Advanced      Novice         0
38     yes  3.975  Advanced    Beginner         1

Example 4: Use a lambda function to double 'gpa', and # return numpy ndarray

>>> from numpy import asarray
>>> inc_gpa_lambda = lambda row, p=20: asarray([row['id'],
...                                row['masters'],
...                                row['gpa'] + row['gpa'] * p/100,
...                                row['stats'],
...                                row['programming'],
...                                row['admitted']])
>>> increase_gpa_100 = df.apply(lambda row: inc_gpa_lambda(row,
...                                                        p=100),
...                             env_name='testenv')
>>>
>>> print(increase_gpa_100)
   masters   gpa     stats programming  admitted
id
13      no  8.00  Advanced      Novice         1
26     yes  7.14  Advanced    Advanced         1
5       no  6.88    Novice      Novice         0
19     yes  3.96  Advanced    Advanced         0
15     yes  8.00  Advanced    Advanced         1
40     yes  7.90    Novice    Beginner         0
7      yes  4.66    Novice      Novice         1
22     yes  6.92    Novice    Beginner         0
36      no  6.00  Advanced      Novice         0
38     yes  5.30  Advanced    Beginner         1