To use teradataml DataFrame Column (also called ColumnExpression) in a filter, assign or join, you must access the Column.
There are two ways to access column:
- Access column as DataFrame attribute:
dataframe_object.column_name
- Access column like dictionary:
dataframe_object["column_name"]
If column name contains whitespace or special character, Teradata recommends accessing ColumnExpression like dictionary.
Example Setup
>>> from teradataml.dataframe.sql_functions import case
>>> load_example_data("GLM", ["admissions_train"])
>>> df = DataFrame("admissions_train")
>>> print(df)
masters gpa stats programming admitted id 5 no 3.44 Novice Novice 0 3 no 3.70 Novice Beginner 1 1 yes 3.95 Beginner Beginner 0 20 yes 3.90 Advanced Advanced 1 8 no 3.60 Beginner Advanced 1 25 no 3.96 Advanced Advanced 1 18 yes 3.81 Advanced Advanced 1 24 no 1.87 Advanced Novice 1 26 yes 3.57 Advanced Advanced 1 38 yes 2.65 Advanced Beginner 1
Example 1: Access ColumnExpression as attribute and use the same as predicate for filter
>>> gpa = df.gpa
>>> good_df = df[case([(gpa > 3.0, 'good'), (gpa > 2.0, 'average')], else_='bad') == 'good']
>>> print(good_df)
masters gpa stats programming admitted id 13 no 4.00 Advanced Novice 1 11 no 3.13 Advanced Advanced 1 9 no 3.82 Advanced Advanced 1 26 yes 3.57 Advanced Advanced 1 3 no 3.70 Novice Beginner 1 1 yes 3.95 Beginner Beginner 0 20 yes 3.90 Advanced Advanced 1 18 yes 3.81 Advanced Advanced 1 5 no 3.44 Novice Novice 0 32 yes 3.46 Advanced Beginner 0
>>> print(good_df.shape)
(35, 6)
Example 2: Access ColumnExpression like dictionary and use the same to create a new DataFrame
This example accesses ColumnExpression like dictionary and uses the same to create a new DataFrame with an additional 'rating' column using assign operation, with the same case construct used in the previous example.
>>> gpa = df['gpa']
>>> whens_df = df.assign(rating = case([(gpa > 3.0, 'good'), (gpa > 2.0, 'average')], else_='bad'))
>>> print(whens_df)
masters gpa stats programming admitted rating id 5 no 3.44 Novice Novice 0 good 3 no 3.70 Novice Beginner 1 good 1 yes 3.95 Beginner Beginner 0 good 20 yes 3.90 Advanced Advanced 1 good 8 no 3.60 Beginner Advanced 1 good 25 no 3.96 Advanced Advanced 1 good 18 yes 3.81 Advanced Advanced 1 good 24 no 1.87 Advanced Novice 1 bad 26 yes 3.57 Advanced Advanced 1 good 38 yes 2.65 Advanced Beginner 1 average
>>> print(whens_df.shape)
(40, 7)