case | Teradata Package for Python - case - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

Use the case() function to add SQL CASE logic based on DataFrame column expressions. It can be used with DataFrame methods assign and filter.

Required Argument:
  • whens: Specifies the criteria to be compared against. It accepts two different forms, based on whether or not the value argument is used.
    • In the first form, it accepts a list of 2-tuples; each 2-tuple consists of (<sql expression>, <value>), where the <sql expression> is a boolean expression and 'value' is a resulting value.
      For example:
      case([
       (df.first_name == 'wendy', 'W'),
       (df.first_name == 'jack', 'J')
      ])
    • In the second form, it accepts a Python dictionary of comparison values mapped to a resulting value. This form requires 'value' argument to be present, and values are compared using the '==' operator.
      For example:
      case(
       {"wendy": "W", "jack": "J"},
       value=df.first_name
      )
Optional Arguments:
  • value: Specifies a SQL expression (ColumnExpression or literal) which is used as a fixed comparison point for candidate values within a dictionary passed to the whens argument.

    This argument is required when whens argument is of dictionary type.

  • else_: Specifies a SQL expression (ColumnExpression or literal) which is the evaluated result of the CASE construct if all expressions within whens evaluate to False.

    When this argument is omitted, function produces a result of NULL if none of the when expressions evaluate to True.

Example Setup

>>> from teradataml.dataframe.sql_functions import case
>>> load_example_data("GLM", ["admissions_train"])
>>> df = DataFrame("admissions_train")
>>> print(df)
   masters   gpa     stats programming  admitted
id
5       no  3.44    Novice      Novice         0
3       no  3.70    Novice    Beginner         1
1      yes  3.95  Beginner    Beginner         0
20     yes  3.90  Advanced    Advanced         1
8       no  3.60  Beginner    Advanced         1
25      no  3.96  Advanced    Advanced         1
18     yes  3.81  Advanced    Advanced         1
24      no  1.87  Advanced      Novice         1
26     yes  3.57  Advanced    Advanced         1
38     yes  2.65  Advanced    Beginner         1
>>> print(df.shape)
(40, 6)

Example: Run case() with 'whens' Passing 2-tuple

This example shows the whens argument passes a 2-tuple and filters only the 'good' records based on the rating:
  • - gpa > 3.0 : 'good'
  • - 2.0 < gpa <=3.0 : 'average'
  • - gpa <= 2.0 : 'bad'
>>> good_df = df[case([(df.gpa > 3.0, 'good'),
                       (df.gpa > 2.0, 'average')],
                       else_='bad') == 'good']
>>> print(good_df)
   masters   gpa     stats programming  admitted
id
13      no  4.00  Advanced      Novice         1
11      no  3.13  Advanced    Advanced         1
9       no  3.82  Advanced    Advanced         1
26     yes  3.57  Advanced    Advanced         1
3       no  3.70    Novice    Beginner         1
1      yes  3.95  Beginner    Beginner         0
20     yes  3.90  Advanced    Advanced         1
18     yes  3.81  Advanced    Advanced         1
5       no  3.44    Novice      Novice         0
32     yes  3.46  Advanced    Beginner         0
>>> print(good_df.shape)
(35, 6)

Example: With Same Case Construct, Create New DataFrame with Additional 'rating' Column Using 'assign' Operation

>>> whens_df = df.assign(rating = case([(df.gpa > 3.0, 'good'),
                                        (df.gpa > 2.0, 'average')],
                                        else_='bad'))
>>> print(whens_df)
   masters   gpa     stats programming  admitted   rating
id
5       no  3.44    Novice      Novice         0     good
3       no  3.70    Novice    Beginner         1     good
1      yes  3.95  Beginner    Beginner         0     good
20     yes  3.90  Advanced    Advanced         1     good
8       no  3.60  Beginner    Advanced         1     good
25      no  3.96  Advanced    Advanced         1     good
18     yes  3.81  Advanced    Advanced         1     good
24      no  1.87  Advanced      Novice         1      bad
26     yes  3.57  Advanced    Advanced         1     good
38     yes  2.65  Advanced    Beginner         1  average
>>> print(whens_df.shape)
(40, 7)

Example: Run case() without else_ argument

This example runs the case() function without specifying the else_ argument, resulting NULLs when no condition in the whens argument is met.

>>> no_else =  df.assign(rating = case([(df.gpa > 3.0, 'good')]))
>>> print(no_else)
   masters   gpa     stats programming  admitted  rating
id
5       no  3.44    Novice      Novice         0    good
3       no  3.70    Novice    Beginner         1    good
1      yes  3.95  Beginner    Beginner         0    good
20     yes  3.90  Advanced    Advanced         1    good
8       no  3.60  Beginner    Advanced         1    good
25      no  3.96  Advanced    Advanced         1    good
18     yes  3.81  Advanced    Advanced         1    good
24      no  1.87  Advanced      Novice         1    None
26     yes  3.57  Advanced    Advanced         1    good
38     yes  2.65  Advanced    Beginner         1    None
>>> print(no_else.shape)
(40, 7)

Example: Run case() with whens Passing Dictionary with Argument value

This example shows the whens argument passing dictionary along with the argument value. The result is 'admitted' when the df.admitted == 1, 'not admitted' when df.admitted == 0, and 'dont't know' otherwise.

>>> whens_value_df = df.assign(admitted_text = case({ 1 : "admitted", 0 : "not admitted"},
                                                    value=df.admitted,
                                                    else_="don't know"))
>>> print(whens_value_df)
   masters   gpa     stats programming  admitted  admitted_text
id
13      no  4.00  Advanced      Novice         1       admitted
11      no  3.13  Advanced    Advanced         1       admitted
9       no  3.82  Advanced    Advanced         1       admitted
28      no  3.93  Advanced    Advanced         1       admitted
33      no  3.55    Novice      Novice         1       admitted
10      no  3.71  Advanced    Advanced         1       admitted
16      no  3.70  Advanced    Advanced         1       admitted
32     yes  3.46  Advanced    Beginner         0   not admitted
34     yes  3.85  Advanced    Beginner         0   not admitted
17      no  3.83  Advanced    Advanced         1       admitted
>>> print(whens_value_df.shape)
(40, 7)

Example: Run case() Using Literal Column Name as SQL Expressions

This example uses literal column name as SQL expressions and chooses to emit values from two different columns as a result of the same case expression. It shows how you can decide on projecting a column based on the value of expression.

In the example, you project values from column 'average_rating' if 2.0 < gpa <= 3.0 and values from column 'good_rating' when gpa > 3.0, and name the column 'ga_rating'.

  • Load required library.
    >>> from sqlalchemy.sql import literal_column
  • Create a new DataFrame with a new column 'good_rating' which is set to 'good' only when gpa > 3.0, else NULL.
    >>> whens_new_df = df.assign(good_rating = case([(df.gpa > 3.0, 'good')]))
  • Add another column named 'avg_rating' to the DataFrame which is set to 'average' only when 2.0 < gpa <= 3.0, else NULL.
    >>> whens_new_df = whens_new_df.assign(avg_rating = case([((whens_new_df.gpa > 2.0) & (whens_new_df.gpa <= 3.0), 'average')]))
  • Create a DataFrame with another column named 'ga_rating', which is set to the value from column 'good_rating' when gpa > 3.0, and value from column 'avg_rating' when gpa > 2.0, else NULL.
    >>> literal_df = whens_new_df.assign(ga_rating = case([(whens_new_df.gpa > 3.0, literal_column('good_rating')), (whens_new_df.gpa > 2.0, literal_column('avg_rating'))]))
    >>> print(literal_df)
       masters   gpa     stats programming  admitted good_rating   avg_rating ga_rating
    id
    5       no  3.44    Novice      Novice         0        good         None      good
    3       no  3.70    Novice    Beginner         1        good         None      good
    1      yes  3.95  Beginner    Beginner         0        good         None      good
    20     yes  3.90  Advanced    Advanced         1        good         None      good
    8       no  3.60  Beginner    Advanced         1        good         None      good
    25      no  3.96  Advanced    Advanced         1        good         None      good
    18     yes  3.81  Advanced    Advanced         1        good         None      good
    24      no  1.87  Advanced      Novice         1        None         None      None
    26     yes  3.57  Advanced    Advanced         1        good         None      good
    38     yes  2.65  Advanced    Beginner         1        None      average   average