assign() Method | Teradata Python Package - 17.00 - assign() Method - Teradata Package for Python

Teradata® Package for Python User Guide

Product
Teradata Package for Python
Release Number
17.00
Release Date
November 2021
Content Type
User Guide
Publication ID
B700-4006-070K
Language
English (United States)

Use the assign() method to assign new column expressions in a teradataml DataFrame. A new DataFrame is returned without modifying the existing DataFrame.

assign(self, drop_columns = False, **kwargs)

The expressions are given as key value pairs where the keys are column names and the values are column expressions. The values can include arithmetic expressions that involve supported python literals and columns (ColumnExpression instances) from the DataFrame.

When the 'drop_columns' parameter is True, it removes columns from the resulting DataFrame if they are not specified in assign. It is False by default, so columns from the previous DataFrame are retained.

Refer to teradataml DataFrame Column for more details about ColumnExpressions in teradataml.
  • The values in kwargs is not callable for now.
  • Since kwargs is a dictionary, the order of your arguments may not be preserved. To make things predictable, the columns are inserted in alphabetical order, at the end of your DataFrame. Assigning multiple columns within the same assign() is possible, but you cannot reference other columns created within the same assign call.
  • If no kwargs are given, the function returns self.
  • The maximum number of columns in a DataFrame is 2048.

Supported Types and Operators

Python int, float, Decimal, str and None literals can be used in assign expressions. All arithmetic expressions except floor division (//) and power (**) are supported.

Example 1: Add new columns, retaining original DataFrame columns in resulting DataFrame

This example adds new columns created using arithmetic operations and constants, retaining original DataFrame columns in resulting DataFrame.

>>> df = DataFrame("iris_test")
>>> df
     sepal_length  sepal_width  petal_length  petal_width species
id                                                              
120           6.0          2.2           5.0          1.5       3
20            5.1          3.8           1.5          0.3       1
60            5.2          2.7           3.9          1.4       2
10            4.9          3.1           1.5          0.1       1
15            5.8          4.0           1.2          0.2       1
30            4.7          3.2           1.6          0.2       1
70            5.6          2.5           3.9          1.1       2
65            5.6          2.9           3.6          1.3       2
5             5.0          3.6           1.4          0.2       1
80            5.7          2.6           3.5          1.0       2
Alias the columns to use in assign:
>>> s_len = df.sepal_length
>>> p_len = df.petal_length
Add new column expressions to DataFrame:
>>> df.select(['sepal_length', 'petal_length']).\
...    assign(sum  = s_len + p_len,
...           diff = s_len - p_len,
...           prod = s_len * p_len,
...           div = s_len / p_len,
...           mod = s_len % p_len,
...           num_constant = 1,
...           str_constant = 'string')
   sepal_length  petal_length  diff       div  mod num_constant   prod str_constant   sum
0           5.6           3.9   1.7  1.435897  1.7            1  21.84       string   9.5
1           5.7           3.5   2.2  1.628571  2.2            1  19.95       string   9.2
2           6.0           5.0   1.0  1.200000  1.0            1  30.00       string  11.0
3           4.9           1.5   3.4  3.266667  0.4            1   7.35       string   6.4
4           5.0           1.4   3.6  3.571429  0.8            1   7.00       string   6.4
5           5.1           1.5   3.6  3.400000  0.6            1   7.65       string   6.6
6           5.2           3.9   1.3  1.333333  1.3            1  20.28       string   9.1
7           5.6           3.6   2.0  1.555556  2.0            1  20.16       string   9.2
8           5.1           1.5   3.6  3.400000  0.6            1   7.65       string   6.6
9           4.7           1.6   3.1  2.937500  1.5            1   7.52       string   6.3

Example 2: Add new columns, dropping original DataFrame columns in resulting DataFrame

This example adds new columns created using arithmetic operations and constants, dropping original DataFrame columns in resulting DataFrame.

>>> df.assign(drop_columns = True,
...           sum  = s_len + p_len,
...           diff = s_len - p_len,
...           prod = s_len * p_len,
...           div = s_len / p_len,
...           mod = s_len % 2,
...           num_constant = 1,
...           str_constant = 'string'
... )
   diff       div  mod num_constant   prod str_constant   sum
0   1.0  1.200000  0.0            1  30.00       string  11.0
1   3.1  2.937500  0.7            1   7.52       string   6.3
2   1.7  1.435897  1.6            1  21.84       string   9.5
3   3.6  3.571429  1.0            1   7.00       string   6.4
4   1.3  1.333333  1.2            1  20.28       string   9.1
5   3.4  3.266667  0.9            1   7.35       string   6.4
6   2.0  1.555556  1.6            1  20.16       string   9.2
7   3.6  3.400000  1.1            1   7.65       string   6.6
8   4.6  4.833333  1.8            1   6.96       string   7.0
9   2.2  1.628571  1.7            1  19.95       string   9.2