assign() Method | Teradata Python Package - assign() Method - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
VMware
Enterprise
IntelliFlex
Product
Teradata Package for Python
Release Number
20.00
Published
March 2025
ft:locale
en-US
ft:lastEdition
2026-02-20
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

Use the assign() method to assign new column expressions in a teradataml DataFrame. A new DataFrame is returned without modifying the existing DataFrame.

assign(self, drop_columns = False, **kwargs)

The expressions are given as key value pairs where the keys are column names and the values are column expressions. The values can include arithmetic expressions that involve supported python literals and columns (ColumnExpression instances) from the DataFrame or Array objects.

When the 'drop_columns' parameter is True, it removes columns from the resulting DataFrame if they are not specified in assign. It is False by default, so columns from the previous DataFrame are retained.

assign() accepts ColumnExpressions returned by udf - Function decorator and call_udf().

Refer to teradataml DataFrame Column for more details about ColumnExpressions in teradataml.

  • The values in kwargs is not callable for now.
  • Since kwargs is a dictionary, the order of your arguments may not be preserved. To make things predictable, the columns are inserted in alphabetical order, at the end of your DataFrame. Assigning multiple columns within the same assign() is possible, but you cannot reference other columns created within the same assign call.
  • If no kwargs are given, the function returns self.
  • The maximum number of columns in a DataFrame is 2048.
  • When Array object is passed as a value, all columns or literal values within the array should be of similar type.

Supported Types and Operators

Python int, float, Decimal, str and None literals can be used in assign expressions. All arithmetic expressions except floor division (//) and power (**) are supported.

Example 1: Add new columns, retaining original DataFrame columns in resulting DataFrame

This example adds new columns created using arithmetic operations and constants, retaining original DataFrame columns in resulting DataFrame.

>>> df = DataFrame("iris_test")
>>> df
     sepal_length  sepal_width  petal_length  petal_width species
id                                                              
120           6.0          2.2           5.0          1.5       3
20            5.1          3.8           1.5          0.3       1
60            5.2          2.7           3.9          1.4       2
10            4.9          3.1           1.5          0.1       1
15            5.8          4.0           1.2          0.2       1
30            4.7          3.2           1.6          0.2       1
70            5.6          2.5           3.9          1.1       2
65            5.6          2.9           3.6          1.3       2
5             5.0          3.6           1.4          0.2       1
80            5.7          2.6           3.5          1.0       2
Alias the columns to use in assign:
>>> s_len = df.sepal_length
>>> p_len = df.petal_length
Add new column expressions to DataFrame:
>>> df.select(['sepal_length', 'petal_length']).\
...    assign(sum  = s_len + p_len,
...           diff = s_len - p_len,
...           prod = s_len * p_len,
...           div = s_len / p_len,
...           mod = s_len % p_len,
...           num_constant = 1,
...           str_constant = 'string')
   sepal_length  petal_length  diff       div  mod num_constant   prod str_constant   sum
0           5.6           3.9   1.7  1.435897  1.7            1  21.84       string   9.5
1           5.7           3.5   2.2  1.628571  2.2            1  19.95       string   9.2
2           6.0           5.0   1.0  1.200000  1.0            1  30.00       string  11.0
3           4.9           1.5   3.4  3.266667  0.4            1   7.35       string   6.4
4           5.0           1.4   3.6  3.571429  0.8            1   7.00       string   6.4
5           5.1           1.5   3.6  3.400000  0.6            1   7.65       string   6.6
6           5.2           3.9   1.3  1.333333  1.3            1  20.28       string   9.1
7           5.6           3.6   2.0  1.555556  2.0            1  20.16       string   9.2
8           5.1           1.5   3.6  3.400000  0.6            1   7.65       string   6.6
9           4.7           1.6   3.1  2.937500  1.5            1   7.52       string   6.3

Example 2: Add new columns, dropping original DataFrame columns in resulting DataFrame

This example adds new columns created using arithmetic operations and constants, dropping original DataFrame columns in resulting DataFrame.

>>> df.assign(drop_columns = True,
...           sum  = s_len + p_len,
...           diff = s_len - p_len,
...           prod = s_len * p_len,
...           div = s_len / p_len,
...           mod = s_len % 2,
...           num_constant = 1,
...           str_constant = 'string'
... )
   diff       div  mod num_constant   prod str_constant   sum
0   1.0  1.200000  0.0            1  30.00       string  11.0
1   3.1  2.937500  0.7            1   7.52       string   6.3
2   1.7  1.435897  1.6            1  21.84       string   9.5
3   3.6  3.571429  1.0            1   7.00       string   6.4
4   1.3  1.333333  1.2            1  20.28       string   9.1
5   3.4  3.266667  0.9            1   7.35       string   6.4
6   2.0  1.555556  1.6            1  20.16       string   9.2
7   3.6  3.400000  1.1            1   7.65       string   6.6
8   4.6  4.833333  1.8            1   6.96       string   7.0
9   2.2  1.628571  1.7            1  19.95       string   9.2

Example 3: Use assign to add a new array column to store columns Jan, Mar and a literal values

>>> from teradataml import load_example_data, Array, DataFrame
>>> from teradatasqlalchemy.types import ARRAY_INTEGER

Load the 'sales' data and create a DataFrame.

>>> load_example_data("dataframe", "sales")
>>> df = DataFrame("sales")

Create an array column.

>> res = df.assign(new_col=Array((df.Jan, "10", df.Feb, 200),  atype=ARRAY_INTEGER('[1:100]')))
>>> res

Output

                Feb    Jan    Mar    Apr    datetime            new_col
accounts                                                             
Blue Inc     90.0   50.0   95.0  101.0  04/01/2017     (50,10,90,200)
Red Inc     200.0  150.0  140.0    NaN  04/01/2017   (150,10,200,200)
Yellow Inc   90.0    NaN    NaN    NaN  04/01/2017   (NULL,10,90,200)
Jones LLC   200.0  150.0  140.0  180.0  04/01/2017   (150,10,200,200)
Orange Inc  210.0    NaN    NaN  250.0  04/01/2017  (NULL,10,210,200)
Alpha Co    210.0  200.0  215.0  250.0  04/01/2017   (200,10,210,200)
>>> res.tdtypes

Output

accounts    VARCHAR(length=20, charset='LATIN')
Feb                                     FLOAT()
Jan                                    BIGINT()
Mar                                    BIGINT()
Apr                                    BIGINT()
datetime                                 DATE()
new_col                ARRAY_INTEGER('[1:100]')