Teradata Package for Python Function Reference | 17.10 - assign - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

teradataml.dataframe.dataframe.DataFrame.assign = assign(self, drop_columns=False, **kwargs): DESCRIPTION: Assign new columns to a teradataml DataFrame. PARAMETERS: drop_columns: Optional Argument. If True, drop columns that are not specified in assign. Note: When DataFrame.assign() is run on DataFrame.groupby(), this argument is ignored. In such cases, all columns are dropped and only new columns and grouping columns are returned. Default Value: False Types: bool kwargs: Specifies keyword-value pairs. - keywords are the column names. - values can be: * Column arithmetic expressions. * int/float/string literals. * DataFrameColumn a.k.a. ColumnExpression Functions. (Seee DataFrameColumn Functions in Function reference guide for more details) * SQLAlchemy ClauseElements. (See teradataml extension with SQLAlchemy in teradataml User Guide and Function reference guide for more details) RETURNS: teradataml DataFrame A new DataFrame is returned with: 1. New columns in addition to all the existing columns if "drop_columns" is False. 2. Only new columns if "drop_columns" is True. 3. New columns in addition to group by columns, i.e., columns used for grouping, if assign() is run on DataFrame.groupby(). NOTES: 1. The values in kwargs cannot be callable (functions). 2. The original DataFrame is not modified. 3. Since ``kwargs`` is a dictionary, the order of your arguments may not be preserved. To make things predictable, the columns are inserted in alphabetical order, after the existing columns in the DataFrame. Assigning multiple columns within the same ``assign`` is possible, but you cannot reference other columns created within the same ``assign`` call. 4. The maximum number of columns that a DataFrame can have is 2048. 5. With DataFrame.groupby(), only aggregate functions and literal values are advised to use. Other functions, such as string functions, can also be used, but the column used in such function must be a part of group by columns. See examples for teradataml extension with SQLAlchemy on using various functions with DataFrame.assign(). RAISES: 1. ValueError - When a callable is passed as a value, or columns from different DataFrames are passed as values in kwargs. 2. TeradataMlException - When the return DataFrame initialization fails, or invalid argument types are passed. EXAMPLES: >>> load_example_data("dataframe", "admissions_train") >>> df = DataFrame("admissions_train") >>> c1 = df.gpa >>> c2 = df.id >>> # # Executing assign() with Arithmetic operations on columns. # All below examples use columns "gpa" and "id" for # arithmetic operations to create a new DataFrame including the new columns. # # Let's take a look at various operations that can be performed # using assign() and arithmetic operations on columns. # Example 1: Addition of two columns "gpa" and "id". >>> df.assign(new_column = c1 + c2).sort("id") masters gpa stats programming admitted new_column id 1 yes 3.95 Beginner Beginner 0 4.95 2 yes 3.76 Beginner Beginner 0 5.76 3 no 3.70 Novice Beginner 1 6.70 4 yes 3.50 Beginner Novice 1 7.50 5 no 3.44 Novice Novice 0 8.44 6 yes 3.50 Beginner Advanced 1 9.50 7 yes 2.33 Novice Novice 1 9.33 8 no 3.60 Beginner Advanced 1 11.60 9 no 3.82 Advanced Advanced 1 12.82 10 no 3.71 Advanced Advanced 1 13.71 >>> # Example 2: Multiplication of columns "gpa" and "id". >>> df.assign(new_column = c1 * c2).sort("id") masters gpa stats programming admitted new_column id 1 yes 3.95 Beginner Beginner 0 3.95 2 yes 3.76 Beginner Beginner 0 7.52 3 no 3.70 Novice Beginner 1 11.10 4 yes 3.50 Beginner Novice 1 14.00 5 no 3.44 Novice Novice 0 17.20 6 yes 3.50 Beginner Advanced 1 21.00 7 yes 2.33 Novice Novice 1 16.31 8 no 3.60 Beginner Advanced 1 28.80 9 no 3.82 Advanced Advanced 1 34.38 10 no 3.71 Advanced Advanced 1 37.10 >>> # Example 3: Division of columns. Divide "id" by "gpa". >>> df.assign(new_column = c2 / c1).sort("id") masters gpa stats programming admitted new_column id 1 yes 3.95 Beginner Beginner 0 0.253165 2 yes 3.76 Beginner Beginner 0 0.531915 3 no 3.70 Novice Beginner 1 0.810811 4 yes 3.50 Beginner Novice 1 1.142857 5 no 3.44 Novice Novice 0 1.453488 6 yes 3.50 Beginner Advanced 1 1.714286 7 yes 2.33 Novice Novice 1 3.004292 8 no 3.60 Beginner Advanced 1 2.222222 9 no 3.82 Advanced Advanced 1 2.356021 10 no 3.71 Advanced Advanced 1 2.695418 >>> # Example 4: Subtract values in column "id" from "gpa". >>> df.assign(new_column = c1 - c2).sort("id") masters gpa stats programming admitted new_column id 1 yes 3.95 Beginner Beginner 0 2.95 2 yes 3.76 Beginner Beginner 0 1.76 3 no 3.70 Novice Beginner 1 0.70 4 yes 3.50 Beginner Novice 1 -0.50 5 no 3.44 Novice Novice 0 -1.56 6 yes 3.50 Beginner Advanced 1 -2.50 7 yes 2.33 Novice Novice 1 -4.67 8 no 3.60 Beginner Advanced 1 -4.40 9 no 3.82 Advanced Advanced 1 -5.18 10 no 3.71 Advanced Advanced 1 -6.29 # Example 5: Modulo division of values in column "id" and "gpa". >>> df.assign(new_column = c2 % c1).sort("id") masters gpa stats programming admitted new_column id 1 yes 3.95 Beginner Beginner 0 1.00 2 yes 3.76 Beginner Beginner 0 2.00 3 no 3.70 Novice Beginner 1 3.00 4 yes 3.50 Beginner Novice 1 0.50 5 no 3.44 Novice Novice 0 1.56 6 yes 3.50 Beginner Advanced 1 2.50 7 yes 2.33 Novice Novice 1 0.01 8 no 3.60 Beginner Advanced 1 0.80 9 no 3.82 Advanced Advanced 1 1.36 10 no 3.71 Advanced Advanced 1 2.58 >>> # # Executing assign() with literal values. # # Example 6: Adding an integer literal value to the values of columns "gpa" and "id". >>> df.assign(c3 = c1 + 1, c4 = c2 + 1).sort("id") masters gpa stats programming admitted c3 c4 id 1 yes 3.95 Beginner Beginner 0 4.95 2 2 yes 3.76 Beginner Beginner 0 4.76 3 3 no 3.70 Novice Beginner 1 4.70 4 4 yes 3.50 Beginner Novice 1 4.50 5 5 no 3.44 Novice Novice 0 4.44 6 6 yes 3.50 Beginner Advanced 1 4.50 7 7 yes 2.33 Novice Novice 1 3.33 8 8 no 3.60 Beginner Advanced 1 4.60 9 9 no 3.82 Advanced Advanced 1 4.82 10 10 no 3.71 Advanced Advanced 1 4.71 11 >>> # Example 7: Create a new column with an integer literal value. >>> df.assign(c1 = 1).sort("id") masters gpa stats programming admitted c1 id 1 yes 3.95 Beginner Beginner 0 1 2 yes 3.76 Beginner Beginner 0 1 3 no 3.70 Novice Beginner 1 1 4 yes 3.50 Beginner Novice 1 1 5 no 3.44 Novice Novice 0 1 6 yes 3.50 Beginner Advanced 1 1 7 yes 2.33 Novice Novice 1 1 8 no 3.60 Beginner Advanced 1 1 9 no 3.82 Advanced Advanced 1 1 10 no 3.71 Advanced Advanced 1 1 >>> # Example 8: Create a new column with a string literal value. >>> df.assign(c3 = 'string').sort("id") masters gpa stats programming admitted c3 id 1 yes 3.95 Beginner Beginner 0 string 2 yes 3.76 Beginner Beginner 0 string 3 no 3.70 Novice Beginner 1 string 4 yes 3.50 Beginner Novice 1 string 5 no 3.44 Novice Novice 0 string 6 yes 3.50 Beginner Advanced 1 string 7 yes 2.33 Novice Novice 1 string 8 no 3.60 Beginner Advanced 1 string 9 no 3.82 Advanced Advanced 1 string 10 no 3.71 Advanced Advanced 1 string >>> # Example 9: Concatenation of strings, a string literal and value from # "masters" column. # '+' operator is overridden for string columns. >>> df.assign(concatenated = "Completed? " + df.masters).sort("id") masters gpa stats programming admitted concatenated id 1 yes 3.95 Beginner Beginner 0 Completed? yes 2 yes 3.76 Beginner Beginner 0 Completed? yes 3 no 3.70 Novice Beginner 1 Completed? no 4 yes 3.50 Beginner Novice 1 Completed? yes 5 no 3.44 Novice Novice 0 Completed? no 6 yes 3.50 Beginner Advanced 1 Completed? yes 7 yes 2.33 Novice Novice 1 Completed? yes 8 no 3.60 Beginner Advanced 1 Completed? no 9 no 3.82 Advanced Advanced 1 Completed? no 10 no 3.71 Advanced Advanced 1 Completed? no >>> # # Significance of "drop_columns" in assign(). # Setting drop_columns to True will only return assigned expressions. # # Example 10: Drop all column and return new assigned expressions. >>> df.assign(drop_columns = True, ... addc = c1 + c2, ... subc = c1 - c2, ... mulc = c1 * c2, ... divc = c1/c2).sort("addc") addc divc mulc subc 0 4.95 3.950000 3.95 2.95 1 5.76 1.880000 7.52 1.76 2 6.70 1.233333 11.10 0.70 3 7.50 0.875000 14.00 -0.50 4 8.44 0.688000 17.20 -1.56 5 9.33 0.332857 16.31 -4.67 6 9.50 0.583333 21.00 -2.50 7 11.60 0.450000 28.80 -4.40 8 12.82 0.424444 34.38 -5.18 9 13.71 0.371000 37.10 -6.29 >>> # Example 11: Duplicate a column with a new name. # In the example here, we are duplicating: # 1. Column "id" to new column "c1". # 2. Column "gpa" to new column "c2". >>> df.assign(c1 = c2, c2 = c1).sort("id") masters gpa stats programming admitted c1 c2 id 1 yes 3.95 Beginner Beginner 0 1 3.95 2 yes 3.76 Beginner Beginner 0 2 3.76 3 no 3.70 Novice Beginner 1 3 3.70 4 yes 3.50 Beginner Novice 1 4 3.50 5 no 3.44 Novice Novice 0 5 3.44 6 yes 3.50 Beginner Advanced 1 6 3.50 7 yes 2.33 Novice Novice 1 7 2.33 8 no 3.60 Beginner Advanced 1 8 3.60 9 no 3.82 Advanced Advanced 1 9 3.82 10 no 3.71 Advanced Advanced 1 10 3.71 >>> # Example 12: Renaming columns. # Example 6 command can be modified a bit to rename columns, rather than # duplicating it. # Use "drop_column=True" in example 6 command to select all the desired columns. >>> df.assign(drop_columns=True, c1 = c2, c2 = c1, ... masters=df.masters, ... stats=df.stats, ... programming=df.programming, ... admitted=df.admitted).sort("c1") masters stats programming admitted c1 c2 0 yes Beginner Beginner 0 1 3.95 1 yes Beginner Beginner 0 2 3.76 2 no Novice Beginner 1 3 3.70 3 yes Beginner Novice 1 4 3.50 4 no Novice Novice 0 5 3.44 5 yes Beginner Advanced 1 6 3.50 6 yes Novice Novice 1 7 2.33 7 no Beginner Advanced 1 8 3.60 8 no Advanced Advanced 1 9 3.82 9 no Advanced Advanced 1 10 3.71 >>> # # Executing Aggregate Functions with assign() and DataFrame.groupby(). # # Here, we will be using 'func' from sqlalchemy to run some aggregate functions. >>> from sqlalchemy import func >>> # Example 13: Calculate average "gpa" for values in the "stats" column. >>> df.groupby("stats").assign(res=func.ave(df.gpa.expression)) stats res 0 Beginner 3.662000 1 Advanced 3.508750 2 Novice 3.559091 >>> # Example 14: Calculate standard deviation, kurtosis value and sum of values in # the "gpa" column with values grouped by values in the "stats" column. # Alternate approach for DataFrame.agg(). This allows user to name the # result columns. >>> df.groupby("stats").assign(gpa_std_=func.ave(df.gpa.expression), ... gpa_kurtosis_=func.kurtosis(df.gpa.expression), ... gpa_sum_=func.sum(df.gpa.expression)) stats gpa_kurtosis_ gpa_std_ gpa_sum_ 0 Beginner -0.452859 3.662000 18.31 1 Advanced 2.886226 3.508750 84.21 2 Novice 6.377775 3.559091 39.15 >>>