The case() function can be used to add SQL CASE logic based on DataFrame column expressions. It can be used with DataFrame assign method and filter method.
Example Prerequisites
>>> from teradataml.dataframe.sql_functions import case
>>> load_example_data("GLM", ["admissions_train"])
>>> df = DataFrame("admissions_train") >>> print(df) masters gpa stats programming admitted id 5 no 3.44 Novice Novice 0 3 no 3.70 Novice Beginner 1 1 yes 3.95 Beginner Beginner 0 20 yes 3.90 Advanced Advanced 1 8 no 3.60 Beginner Advanced 1 25 no 3.96 Advanced Advanced 1 18 yes 3.81 Advanced Advanced 1 24 no 1.87 Advanced Novice 1 26 yes 3.57 Advanced Advanced 1 38 yes 2.65 Advanced Beginner 1 >>> print(df.shape) (40, 6)
Example: Run case() with 'whens' passing a 2-tuple
- - gpa > 3.0 : 'good'
- - 2.0 < gpa <=3.0 : 'average'
- - gpa <= 2.0 : 'bad'
>>> good_df = df[case([(df.gpa > 3.0, 'good'), (df.gpa > 2.0, 'average')], else_='bad') == 'good']
>>> print(good_df) masters gpa stats programming admitted id 13 no 4.00 Advanced Novice 1 11 no 3.13 Advanced Advanced 1 9 no 3.82 Advanced Advanced 1 26 yes 3.57 Advanced Advanced 1 3 no 3.70 Novice Beginner 1 1 yes 3.95 Beginner Beginner 0 20 yes 3.90 Advanced Advanced 1 18 yes 3.81 Advanced Advanced 1 5 no 3.44 Novice Novice 0 32 yes 3.46 Advanced Beginner 0 >>> print(good_df.shape) (35, 6)
Example: With same case construct, create a new DataFrame with an additional 'rating' column using assign operation
>>> whens_df = df.assign(rating = case([(df.gpa > 3.0, 'good'), (df.gpa > 2.0, 'average')], else_='bad'))
>>> print(whens_df) masters gpa stats programming admitted rating id 5 no 3.44 Novice Novice 0 good 3 no 3.70 Novice Beginner 1 good 1 yes 3.95 Beginner Beginner 0 good 20 yes 3.90 Advanced Advanced 1 good 8 no 3.60 Beginner Advanced 1 good 25 no 3.96 Advanced Advanced 1 good 18 yes 3.81 Advanced Advanced 1 good 24 no 1.87 Advanced Novice 1 bad 26 yes 3.57 Advanced Advanced 1 good 38 yes 2.65 Advanced Beginner 1 average >>> print(whens_df.shape) (40, 7)
Example: Run case() without 'else_' argument
This example shows running the case() function without specifying the 'else_' argument, this results NULLs when no condition in the 'whens' argument is met.
>>> no_else = df.assign(rating = case([(df.gpa > 3.0, 'good')]))
>>> print(no_else) masters gpa stats programming admitted rating id 5 no 3.44 Novice Novice 0 good 3 no 3.70 Novice Beginner 1 good 1 yes 3.95 Beginner Beginner 0 good 20 yes 3.90 Advanced Advanced 1 good 8 no 3.60 Beginner Advanced 1 good 25 no 3.96 Advanced Advanced 1 good 18 yes 3.81 Advanced Advanced 1 good 24 no 1.87 Advanced Novice 1 None 26 yes 3.57 Advanced Advanced 1 good 38 yes 2.65 Advanced Beginner 1 None >>> print(no_else.shape) (40, 7)
Example: Run case() with 'whens' passing dictionary along with argument 'value'
This example shows the 'whens' argument passing dictionary along with the argument 'value'. The result is 'admitted' when the argument df.admitted == 1, 'not admitted' when df.admitted == 0, and 'dont't know' otherwise.
>>> whens_value_df = df.assign(admitted_text = case({ 1 : "admitted", 0 : "not admitted"}, value=df.admitted, else_="don't know"))
>>> print(whens_value_df) masters gpa stats programming admitted admitted_text id 13 no 4.00 Advanced Novice 1 admitted 11 no 3.13 Advanced Advanced 1 admitted 9 no 3.82 Advanced Advanced 1 admitted 28 no 3.93 Advanced Advanced 1 admitted 33 no 3.55 Novice Novice 1 admitted 10 no 3.71 Advanced Advanced 1 admitted 16 no 3.70 Advanced Advanced 1 admitted 32 yes 3.46 Advanced Beginner 0 not admitted 34 yes 3.85 Advanced Beginner 0 not admitted 17 no 3.83 Advanced Advanced 1 admitted >>> print(whens_value_df.shape) (40, 7)
Example: Run case() using literal column name as SQL expressions
This example shows using literal column name as SQL expressions and choosing to emit values from two different columns as a result of the same case expression.
In the example, you project values from column 'average_rating' if 2.0 < gpa <= 3.0, and values from column 'good_rating' when gpa > 3.0, naming the column 'ga_rating'.
>>> from sqlalchemy.sql import literal_column
Create a new DataFrame with a new column 'good_rating' which is set to 'good' only when gpa > 3.0, else NULL.
>>> whens_new_df = df.assign(good_rating = case([(df.gpa > 3.0, 'good')]))
Add another column named 'avg_rating' to the DataFrame which is set to 'average' only when 2 < gpa <= 3.0, else NULL.
>>> whens_new_df = whens_new_df.assign(avg_rating = case([((whens_new_df.gpa > 2.0) & (whens_new_df.gpa <= 3.0), 'average')]))
Create a DataFrame with another column named 'ga_rating', which is set to the value from column 'good_rating' when gpa > 3, and value from column 'avg_rating' when gpa > 2, else NULL.
>>> literal_df = whens_new_df.assign(ga_rating = case([(whens_new_df.gpa > 3.0, literal_column('good_rating')), (whens_new_df.gpa > 2.0, literal_column('avg_rating'))]))
>>> print(literal_df) masters gpa stats programming admitted good_rating avg_rating ga_rating id 5 no 3.44 Novice Novice 0 good None good 3 no 3.70 Novice Beginner 1 good None good 1 yes 3.95 Beginner Beginner 0 good None good 20 yes 3.90 Advanced Advanced 1 good None good 8 no 3.60 Beginner Advanced 1 good None good 25 no 3.96 Advanced Advanced 1 good None good 18 yes 3.81 Advanced Advanced 1 good None good 24 no 1.87 Advanced Novice 1 None None None 26 yes 3.57 Advanced Advanced 1 good None good 38 yes 2.65 Advanced Beginner 1 None average average