Use the case() function to add SQL CASE logic based on DataFrame column expressions. It can be used with DataFrame methods assign and filter.
- whens: Specifies the criteria to be compared against. It accepts two different forms, based on whether or not the value argument is used.
- In the first form, it accepts a list of 2-tuples; each 2-tuple consists of (<sql expression>, <value>), where the <sql expression> is a boolean expression and 'value' is a resulting value.For example:
case([ (df.first_name == 'wendy', 'W'), (df.first_name == 'jack', 'J') ])
- In the second form, it accepts a Python dictionary of comparison values mapped to a resulting value. This form requires 'value' argument to be present, and values are compared using the '==' operator.For example:
case( {"wendy": "W", "jack": "J"}, value=df.first_name )
- In the first form, it accepts a list of 2-tuples; each 2-tuple consists of (<sql expression>, <value>), where the <sql expression> is a boolean expression and 'value' is a resulting value.
- value: Specifies a SQL expression (ColumnExpression or literal) which is used as a fixed comparison point for candidate values within a dictionary passed to the whens argument.
This argument is required when whens argument is of dictionary type.
- else_: Specifies a SQL expression (ColumnExpression or literal) which is the evaluated result of the CASE construct if all expressions within whens evaluate to False.
When this argument is omitted, function produces a result of NULL if none of the when expressions evaluate to True.
Example Setup
>>> from teradataml.dataframe.sql_functions import case
>>> load_example_data("GLM", ["admissions_train"])
>>> df = DataFrame("admissions_train")
>>> print(df)
masters gpa stats programming admitted id 5 no 3.44 Novice Novice 0 3 no 3.70 Novice Beginner 1 1 yes 3.95 Beginner Beginner 0 20 yes 3.90 Advanced Advanced 1 8 no 3.60 Beginner Advanced 1 25 no 3.96 Advanced Advanced 1 18 yes 3.81 Advanced Advanced 1 24 no 1.87 Advanced Novice 1 26 yes 3.57 Advanced Advanced 1 38 yes 2.65 Advanced Beginner 1
>>> print(df.shape)
(40, 6)
Example: Run case() with 'whens' Passing 2-tuple
- - gpa > 3.0 : 'good'
- - 2.0 < gpa <=3.0 : 'average'
- - gpa <= 2.0 : 'bad'
>>> good_df = df[case([(df.gpa > 3.0, 'good'), (df.gpa > 2.0, 'average')], else_='bad') == 'good']
>>> print(good_df)
masters gpa stats programming admitted id 13 no 4.00 Advanced Novice 1 11 no 3.13 Advanced Advanced 1 9 no 3.82 Advanced Advanced 1 26 yes 3.57 Advanced Advanced 1 3 no 3.70 Novice Beginner 1 1 yes 3.95 Beginner Beginner 0 20 yes 3.90 Advanced Advanced 1 18 yes 3.81 Advanced Advanced 1 5 no 3.44 Novice Novice 0 32 yes 3.46 Advanced Beginner 0
>>> print(good_df.shape)
(35, 6)
Example: With Same Case Construct, Create New DataFrame with Additional 'rating' Column Using 'assign' Operation
>>> whens_df = df.assign(rating = case([(df.gpa > 3.0, 'good'), (df.gpa > 2.0, 'average')], else_='bad'))
>>> print(whens_df)
masters gpa stats programming admitted rating id 5 no 3.44 Novice Novice 0 good 3 no 3.70 Novice Beginner 1 good 1 yes 3.95 Beginner Beginner 0 good 20 yes 3.90 Advanced Advanced 1 good 8 no 3.60 Beginner Advanced 1 good 25 no 3.96 Advanced Advanced 1 good 18 yes 3.81 Advanced Advanced 1 good 24 no 1.87 Advanced Novice 1 bad 26 yes 3.57 Advanced Advanced 1 good 38 yes 2.65 Advanced Beginner 1 average
>>> print(whens_df.shape)
(40, 7)
Example: Run case() without else_ argument
This example runs the case() function without specifying the else_ argument, resulting NULLs when no condition in the whens argument is met.
>>> no_else = df.assign(rating = case([(df.gpa > 3.0, 'good')]))
>>> print(no_else)
masters gpa stats programming admitted rating id 5 no 3.44 Novice Novice 0 good 3 no 3.70 Novice Beginner 1 good 1 yes 3.95 Beginner Beginner 0 good 20 yes 3.90 Advanced Advanced 1 good 8 no 3.60 Beginner Advanced 1 good 25 no 3.96 Advanced Advanced 1 good 18 yes 3.81 Advanced Advanced 1 good 24 no 1.87 Advanced Novice 1 None 26 yes 3.57 Advanced Advanced 1 good 38 yes 2.65 Advanced Beginner 1 None
>>> print(no_else.shape)
(40, 7)
Example: Run case() with whens Passing Dictionary with Argument value
This example shows the whens argument passing dictionary along with the argument value. The result is 'admitted' when the df.admitted == 1, 'not admitted' when df.admitted == 0, and 'dont't know' otherwise.
>>> whens_value_df = df.assign(admitted_text = case({ 1 : "admitted", 0 : "not admitted"}, value=df.admitted, else_="don't know"))
>>> print(whens_value_df)
masters gpa stats programming admitted admitted_text id 13 no 4.00 Advanced Novice 1 admitted 11 no 3.13 Advanced Advanced 1 admitted 9 no 3.82 Advanced Advanced 1 admitted 28 no 3.93 Advanced Advanced 1 admitted 33 no 3.55 Novice Novice 1 admitted 10 no 3.71 Advanced Advanced 1 admitted 16 no 3.70 Advanced Advanced 1 admitted 32 yes 3.46 Advanced Beginner 0 not admitted 34 yes 3.85 Advanced Beginner 0 not admitted 17 no 3.83 Advanced Advanced 1 admitted
>>> print(whens_value_df.shape)
(40, 7)
Example: Run case() Using Literal Column Name as SQL Expressions
This example uses literal column name as SQL expressions and chooses to emit values from two different columns as a result of the same case expression. It shows how you can decide on projecting a column based on the value of expression.
In the example, you project values from column 'average_rating' if 2.0 < gpa <= 3.0 and values from column 'good_rating' when gpa > 3.0, and name the column 'ga_rating'.
- Load required library.
>>> from sqlalchemy.sql import literal_column
- Create a new DataFrame with a new column 'good_rating' which is set to 'good' only when gpa > 3.0, else NULL.
>>> whens_new_df = df.assign(good_rating = case([(df.gpa > 3.0, 'good')]))
- Add another column named 'avg_rating' to the DataFrame which is set to 'average' only when 2.0 < gpa <= 3.0, else NULL.
>>> whens_new_df = whens_new_df.assign(avg_rating = case([((whens_new_df.gpa > 2.0) & (whens_new_df.gpa <= 3.0), 'average')]))
- Create a DataFrame with another column named 'ga_rating', which is set to the value from column 'good_rating' when gpa > 3.0, and value from column 'avg_rating' when gpa > 2.0, else NULL.
>>> literal_df = whens_new_df.assign(ga_rating = case([(whens_new_df.gpa > 3.0, literal_column('good_rating')), (whens_new_df.gpa > 2.0, literal_column('avg_rating'))]))
>>> print(literal_df)
masters gpa stats programming admitted good_rating avg_rating ga_rating id 5 no 3.44 Novice Novice 0 good None good 3 no 3.70 Novice Beginner 1 good None good 1 yes 3.95 Beginner Beginner 0 good None good 20 yes 3.90 Advanced Advanced 1 good None good 8 no 3.60 Beginner Advanced 1 good None good 25 no 3.96 Advanced Advanced 1 good None good 18 yes 3.81 Advanced Advanced 1 good None good 24 no 1.87 Advanced Novice 1 None None None 26 yes 3.57 Advanced Advanced 1 good None good 38 yes 2.65 Advanced Beginner 1 None average average