case | Teradata Package for Python - case

case | Teradata Package for Python - case - Teradata Vantage

Teradata® VantageCloud Lake

Deployment

VantageCloud

Edition

Lake

Product

Teradata Vantage

Published

January 2023

Language

English (United States)

Last Update

2024-04-03

dita:mapPath

phg1621910019905.ditamap

dita:ditavalPath

pny1626732985837.ditaval

dita:id

phg1621910019905

Use the case() function to add SQL CASE logic based on DataFrame column expressions. It can be used with DataFrame methods assign and filter.

Required Argument:

whens: Specifies the criteria to be compared against. It accepts two different forms, based on whether or not the value argument is used.
- In the first form, it accepts a list of 2-tuples; each 2-tuple consists of (<sql expression>, <value>), where the <sql expression> is a boolean expression and 'value' is a resulting value.
  For example:
```
case([
 (df.first_name == 'wendy', 'W'),
 (df.first_name == 'jack', 'J')
])
```
- In the second form, it accepts a Python dictionary of comparison values mapped to a resulting value. This form requires 'value' argument to be present, and values are compared using the '==' operator.
  For example:
```
case(
 {"wendy": "W", "jack": "J"},
 value=df.first_name
)
```

Optional Arguments:

value: Specifies a SQL expression (ColumnExpression or literal) which is used as a fixed comparison point for candidate values within a dictionary passed to the whens argument.
This argument is required when whens argument is of dictionary type.
else_: Specifies a SQL expression (ColumnExpression or literal) which is the evaluated result of the CASE construct if all expressions within whens evaluate to False.
When this argument is omitted, function produces a result of NULL if none of the when expressions evaluate to True.

Example Setup

>>> from teradataml.dataframe.sql_functions import case

>>> load_example_data("GLM", ["admissions_train"])

>>> df = DataFrame("admissions_train")

>>> print(df)

   masters   gpa     stats programming  admitted
id
5       no  3.44    Novice      Novice         0
3       no  3.70    Novice    Beginner         1
1      yes  3.95  Beginner    Beginner         0
20     yes  3.90  Advanced    Advanced         1
8       no  3.60  Beginner    Advanced         1
25      no  3.96  Advanced    Advanced         1
18     yes  3.81  Advanced    Advanced         1
24      no  1.87  Advanced      Novice         1
26     yes  3.57  Advanced    Advanced         1
38     yes  2.65  Advanced    Beginner         1

>>> print(df.shape)

(40, 6)

Example: Run case() with 'whens' Passing 2-tuple

This example shows the whens argument passes a 2-tuple and filters only the 'good' records based on the rating:

- gpa > 3.0 : 'good'
- 2.0 < gpa <=3.0 : 'average'
- gpa <= 2.0 : 'bad'

>>> good_df = df[case([(df.gpa > 3.0, 'good'),
                       (df.gpa > 2.0, 'average')],
                       else_='bad') == 'good']

>>> print(good_df)

   masters   gpa     stats programming  admitted
id
13      no  4.00  Advanced      Novice         1
11      no  3.13  Advanced    Advanced         1
9       no  3.82  Advanced    Advanced         1
26     yes  3.57  Advanced    Advanced         1
3       no  3.70    Novice    Beginner         1
1      yes  3.95  Beginner    Beginner         0
20     yes  3.90  Advanced    Advanced         1
18     yes  3.81  Advanced    Advanced         1
5       no  3.44    Novice      Novice         0
32     yes  3.46  Advanced    Beginner         0

>>> print(good_df.shape)

(35, 6)

Example: With Same Case Construct, Create New DataFrame with Additional 'rating' Column Using 'assign' Operation

>>> whens_df = df.assign(rating = case([(df.gpa > 3.0, 'good'),
                                        (df.gpa > 2.0, 'average')],
                                        else_='bad'))

>>> print(whens_df)

   masters   gpa     stats programming  admitted   rating
id
5       no  3.44    Novice      Novice         0     good
3       no  3.70    Novice    Beginner         1     good
1      yes  3.95  Beginner    Beginner         0     good
20     yes  3.90  Advanced    Advanced         1     good
8       no  3.60  Beginner    Advanced         1     good
25      no  3.96  Advanced    Advanced         1     good
18     yes  3.81  Advanced    Advanced         1     good
24      no  1.87  Advanced      Novice         1      bad
26     yes  3.57  Advanced    Advanced         1     good
38     yes  2.65  Advanced    Beginner         1  average

>>> print(whens_df.shape)

(40, 7)

Example: Run case() without else_ argument

This example runs the case() function without specifying the else_ argument, resulting NULLs when no condition in the whens argument is met.

>>> no_else =  df.assign(rating = case([(df.gpa > 3.0, 'good')]))

>>> print(no_else)

   masters   gpa     stats programming  admitted  rating
id
5       no  3.44    Novice      Novice         0    good
3       no  3.70    Novice    Beginner         1    good
1      yes  3.95  Beginner    Beginner         0    good
20     yes  3.90  Advanced    Advanced         1    good
8       no  3.60  Beginner    Advanced         1    good
25      no  3.96  Advanced    Advanced         1    good
18     yes  3.81  Advanced    Advanced         1    good
24      no  1.87  Advanced      Novice         1    None
26     yes  3.57  Advanced    Advanced         1    good
38     yes  2.65  Advanced    Beginner         1    None

>>> print(no_else.shape)

(40, 7)

Example: Run case() with whens Passing Dictionary with Argument value

This example shows the whens argument passing dictionary along with the argument value. The result is 'admitted' when the df.admitted == 1, 'not admitted' when df.admitted == 0, and 'dont't know' otherwise.

>>> whens_value_df = df.assign(admitted_text = case({ 1 : "admitted", 0 : "not admitted"},
                                                    value=df.admitted,
                                                    else_="don't know"))

>>> print(whens_value_df)

   masters   gpa     stats programming  admitted  admitted_text
id
13      no  4.00  Advanced      Novice         1       admitted
11      no  3.13  Advanced    Advanced         1       admitted
9       no  3.82  Advanced    Advanced         1       admitted
28      no  3.93  Advanced    Advanced         1       admitted
33      no  3.55    Novice      Novice         1       admitted
10      no  3.71  Advanced    Advanced         1       admitted
16      no  3.70  Advanced    Advanced         1       admitted
32     yes  3.46  Advanced    Beginner         0   not admitted
34     yes  3.85  Advanced    Beginner         0   not admitted
17      no  3.83  Advanced    Advanced         1       admitted

>>> print(whens_value_df.shape)

(40, 7)

Example: Run case() Using Literal Column Name as SQL Expressions

This example uses literal column name as SQL expressions and chooses to emit values from two different columns as a result of the same case expression. It shows how you can decide on projecting a column based on the value of expression.

In the example, you project values from column 'average_rating' if 2.0 < gpa <= 3.0 and values from column 'good_rating' when gpa > 3.0, and name the column 'ga_rating'.

Load required library.

>>> from sqlalchemy.sql import literal_column

Create a new DataFrame with a new column 'good_rating' which is set to 'good' only when gpa > 3.0, else NULL.
```
>>> whens_new_df = df.assign(good_rating = case([(df.gpa > 3.0, 'good')]))
```

Add another column named 'avg_rating' to the DataFrame which is set to 'average' only when 2.0 < gpa <= 3.0, else NULL.

>>> whens_new_df = whens_new_df.assign(avg_rating = case([((whens_new_df.gpa > 2.0) & (whens_new_df.gpa <= 3.0), 'average')]))

Create a DataFrame with another column named 'ga_rating', which is set to the value from column 'good_rating' when gpa > 3.0, and value from column 'avg_rating' when gpa > 2.0, else NULL.

>>> literal_df = whens_new_df.assign(ga_rating = case([(whens_new_df.gpa > 3.0, literal_column('good_rating')), (whens_new_df.gpa > 2.0, literal_column('avg_rating'))]))

>>> print(literal_df)

   masters   gpa     stats programming  admitted good_rating   avg_rating ga_rating
id
5       no  3.44    Novice      Novice         0        good         None      good
3       no  3.70    Novice    Beginner         1        good         None      good
1      yes  3.95  Beginner    Beginner         0        good         None      good
20     yes  3.90  Advanced    Advanced         1        good         None      good
8       no  3.60  Beginner    Advanced         1        good         None      good
25      no  3.96  Advanced    Advanced         1        good         None      good
18     yes  3.81  Advanced    Advanced         1        good         None      good
24      no  1.87  Advanced      Novice         1        None         None      None
26     yes  3.57  Advanced    Advanced         1        good         None      good
38     yes  2.65  Advanced    Beginner         1        None      average   average