Use the assign() method to assign new column expressions in a teradataml DataFrame. A new DataFrame is returned without modifying the existing DataFrame.
assign(self, drop_columns = False, **kwargs)
The expressions are given as key value pairs where the keys are column names and the values are column expressions. The values can include arithmetic expressions that involve supported python literals and columns (ColumnExpression instances) from the DataFrame or Array objects.
When the 'drop_columns' parameter is True, it removes columns from the resulting DataFrame if they are not specified in assign. It is False by default, so columns from the previous DataFrame are retained.
Refer to teradataml DataFrame Column for more details about ColumnExpressions in teradataml.
- The values in kwargs is not callable for now.
- Since kwargs is a dictionary, the order of your arguments may not be preserved. To make things predictable, the columns are inserted in alphabetical order, at the end of your DataFrame. Assigning multiple columns within the same assign() is possible, but you cannot reference other columns created within the same assign call.
- If no kwargs are given, the function returns self.
- The maximum number of columns in a DataFrame is 2048.
- When Array object is passed as a value, all columns or literal values within the array should be of similar type.
Supported Types and Operators
Python int, float, Decimal, str and None literals can be used in assign expressions. All arithmetic expressions except floor division (//) and power (**) are supported.
Example 1: Add new columns, retaining original DataFrame columns in resulting DataFrame
This example adds new columns created using arithmetic operations and constants, retaining original DataFrame columns in resulting DataFrame.
>>> df = DataFrame("iris_test")
>>> df
sepal_length sepal_width petal_length petal_width species
id
120 6.0 2.2 5.0 1.5 3
20 5.1 3.8 1.5 0.3 1
60 5.2 2.7 3.9 1.4 2
10 4.9 3.1 1.5 0.1 1
15 5.8 4.0 1.2 0.2 1
30 4.7 3.2 1.6 0.2 1
70 5.6 2.5 3.9 1.1 2
65 5.6 2.9 3.6 1.3 2
5 5.0 3.6 1.4 0.2 1
80 5.7 2.6 3.5 1.0 2
Alias the columns to use in assign:>>> s_len = df.sepal_length >>> p_len = df.petal_lengthAdd new column expressions to DataFrame:
>>> df.select(['sepal_length', 'petal_length']).\ ... assign(sum = s_len + p_len, ... diff = s_len - p_len, ... prod = s_len * p_len, ... div = s_len / p_len, ... mod = s_len % p_len, ... num_constant = 1, ... str_constant = 'string')
sepal_length petal_length diff div mod num_constant prod str_constant sum 0 5.6 3.9 1.7 1.435897 1.7 1 21.84 string 9.5 1 5.7 3.5 2.2 1.628571 2.2 1 19.95 string 9.2 2 6.0 5.0 1.0 1.200000 1.0 1 30.00 string 11.0 3 4.9 1.5 3.4 3.266667 0.4 1 7.35 string 6.4 4 5.0 1.4 3.6 3.571429 0.8 1 7.00 string 6.4 5 5.1 1.5 3.6 3.400000 0.6 1 7.65 string 6.6 6 5.2 3.9 1.3 1.333333 1.3 1 20.28 string 9.1 7 5.6 3.6 2.0 1.555556 2.0 1 20.16 string 9.2 8 5.1 1.5 3.6 3.400000 0.6 1 7.65 string 6.6 9 4.7 1.6 3.1 2.937500 1.5 1 7.52 string 6.3
Example 2: Add new columns, dropping original DataFrame columns in resulting DataFrame
This example adds new columns created using arithmetic operations and constants, dropping original DataFrame columns in resulting DataFrame.
>>> df.assign(drop_columns = True, ... sum = s_len + p_len, ... diff = s_len - p_len, ... prod = s_len * p_len, ... div = s_len / p_len, ... mod = s_len % 2, ... num_constant = 1, ... str_constant = 'string' ... ) diff div mod num_constant prod str_constant sum 0 1.0 1.200000 0.0 1 30.00 string 11.0 1 3.1 2.937500 0.7 1 7.52 string 6.3 2 1.7 1.435897 1.6 1 21.84 string 9.5 3 3.6 3.571429 1.0 1 7.00 string 6.4 4 1.3 1.333333 1.2 1 20.28 string 9.1 5 3.4 3.266667 0.9 1 7.35 string 6.4 6 2.0 1.555556 1.6 1 20.16 string 9.2 7 3.6 3.400000 1.1 1 7.65 string 6.6 8 4.6 4.833333 1.8 1 6.96 string 7.0 9 2.2 1.628571 1.7 1 19.95 string 9.2
Example 3: Use assign to add a new array column to store columns Jan, Mar and a literal values
>>> from teradataml import load_example_data, Array, DataFrame >>> from teradatasqlalchemy.types import ARRAY_INTEGER
Load the 'sales' data and create a DataFrame.
>>> load_example_data("dataframe", "sales")
>>> df = DataFrame("sales")
Create an array column.
>> res = df.assign(new_col=Array((df.Jan, "10", df.Feb, 200), atype=ARRAY_INTEGER('[1:100]')))
>>> res
Output
Feb Jan Mar Apr datetime new_col accounts Blue Inc 90.0 50.0 95.0 101.0 04/01/2017 (50,10,90,200) Red Inc 200.0 150.0 140.0 NaN 04/01/2017 (150,10,200,200) Yellow Inc 90.0 NaN NaN NaN 04/01/2017 (NULL,10,90,200) Jones LLC 200.0 150.0 140.0 180.0 04/01/2017 (150,10,200,200) Orange Inc 210.0 NaN NaN 250.0 04/01/2017 (NULL,10,210,200) Alpha Co 210.0 200.0 215.0 250.0 04/01/2017 (200,10,210,200)
>>> res.tdtypes
Output
accounts VARCHAR(length=20, charset='LATIN')
Feb FLOAT()
Jan BIGINT()
Mar BIGINT()
Apr BIGINT()
datetime DATE()
new_col ARRAY_INTEGER('[1:100]')