Use the assign() method to assign new column expressions in a teradataml DataFrame. A new DataFrame is returned without modifying the existing DataFrame.
assign(self, drop_columns = False, **kwargs)
The expressions are given as key value pairs where the keys are column names and the values are column expressions. The values can include arithmetic expressions that involve supported python literals and columns (ColumnExpression instances) from the DataFrame.
When the 'drop_columns' parameter is True, it removes columns from the resulting DataFrame if they are not specified in assign. It is False by default, so columns from the previous DataFrame are retained.
assign() accepts ColumnExpressions returned by udf - Function decorator and call_udf().
Refer to teradataml DataFrame Column for more details about ColumnExpressions in teradataml.- The values in kwargs is not callable for now.
- Since kwargs is a dictionary, the order of your arguments may not be preserved. To make things predictable, the columns are inserted in alphabetical order, at the end of your DataFrame. Assigning multiple columns within the same assign() is possible, but you cannot reference other columns created within the same assign call.
- If no kwargs are given, the function returns self.
- The maximum number of columns in a DataFrame is 2048.
Supported Types and Operators
Python int, float, Decimal, str and None literals can be used in assign expressions. All arithmetic expressions except floor division (//) and power (**) are supported.
Example 1: Add new columns, retaining original DataFrame columns in resulting DataFrame
This example adds new columns created using arithmetic operations and constants, retaining original DataFrame columns in resulting DataFrame.
>>> df = DataFrame("iris_test") >>> df sepal_length sepal_width petal_length petal_width species id 120 6.0 2.2 5.0 1.5 3 20 5.1 3.8 1.5 0.3 1 60 5.2 2.7 3.9 1.4 2 10 4.9 3.1 1.5 0.1 1 15 5.8 4.0 1.2 0.2 1 30 4.7 3.2 1.6 0.2 1 70 5.6 2.5 3.9 1.1 2 65 5.6 2.9 3.6 1.3 2 5 5.0 3.6 1.4 0.2 1 80 5.7 2.6 3.5 1.0 2Alias the columns to use in assign:
>>> s_len = df.sepal_length >>> p_len = df.petal_lengthAdd new column expressions to DataFrame:
>>> df.select(['sepal_length', 'petal_length']).\ ... assign(sum = s_len + p_len, ... diff = s_len - p_len, ... prod = s_len * p_len, ... div = s_len / p_len, ... mod = s_len % p_len, ... num_constant = 1, ... str_constant = 'string')
sepal_length petal_length diff div mod num_constant prod str_constant sum 0 5.6 3.9 1.7 1.435897 1.7 1 21.84 string 9.5 1 5.7 3.5 2.2 1.628571 2.2 1 19.95 string 9.2 2 6.0 5.0 1.0 1.200000 1.0 1 30.00 string 11.0 3 4.9 1.5 3.4 3.266667 0.4 1 7.35 string 6.4 4 5.0 1.4 3.6 3.571429 0.8 1 7.00 string 6.4 5 5.1 1.5 3.6 3.400000 0.6 1 7.65 string 6.6 6 5.2 3.9 1.3 1.333333 1.3 1 20.28 string 9.1 7 5.6 3.6 2.0 1.555556 2.0 1 20.16 string 9.2 8 5.1 1.5 3.6 3.400000 0.6 1 7.65 string 6.6 9 4.7 1.6 3.1 2.937500 1.5 1 7.52 string 6.3
Example 2: Add new columns, dropping original DataFrame columns in resulting DataFrame
This example adds new columns created using arithmetic operations and constants, dropping original DataFrame columns in resulting DataFrame.
>>> df.assign(drop_columns = True, ... sum = s_len + p_len, ... diff = s_len - p_len, ... prod = s_len * p_len, ... div = s_len / p_len, ... mod = s_len % 2, ... num_constant = 1, ... str_constant = 'string' ... ) diff div mod num_constant prod str_constant sum 0 1.0 1.200000 0.0 1 30.00 string 11.0 1 3.1 2.937500 0.7 1 7.52 string 6.3 2 1.7 1.435897 1.6 1 21.84 string 9.5 3 3.6 3.571429 1.0 1 7.00 string 6.4 4 1.3 1.333333 1.2 1 20.28 string 9.1 5 3.4 3.266667 0.9 1 7.35 string 6.4 6 2.0 1.555556 1.6 1 20.16 string 9.2 7 3.6 3.400000 1.1 1 7.65 string 6.6 8 4.6 4.833333 1.8 1 6.96 string 7.0 9 2.2 1.628571 1.7 1 19.95 string 9.2