case | Teradata Package for Python - case - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
March 2024
Language
English (United States)
Last Update
2024-04-09
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

Use the case() function to add SQL CASE logic based on DataFrame column expressions. It can be used with DataFrame methods assign and filter.

Required Argument:
  • whens: Specifies the criteria to be compared against. It accepts two different forms, based on whether or not the value argument is used.
    • In the first form, it accepts a list of 2-tuples; each 2-tuple consists of (<sql expression>, <value>), where the <sql expression> is a boolean expression and 'value' is a resulting value.
      For example:
      case([
       (df.first_name == 'wendy', 'W'),
       (df.first_name == 'jack', 'J')
      ])
    • In the second form, it accepts a Python dictionary of comparison values mapped to a resulting value. This form requires 'value' argument to be present, and values are compared using the '==' operator.
      For example:
      case(
       {"wendy": "W", "jack": "J"},
       value=df.first_name
      )
Optional Arguments:
  • value: Specifies a SQL expression (ColumnExpression or literal) which is used as a fixed comparison point for candidate values within a dictionary passed to the whens argument.

    This argument is required when whens argument is of dictionary type.

  • else_: Specifies a SQL expression (ColumnExpression or literal) which is the evaluated result of the CASE construct if all expressions within whens evaluate to False.

    When this argument is omitted, function produces a result of NULL if none of the when expressions evaluate to True.

Example Setup

>>> from teradataml.dataframe.sql_functions import case
>>> load_example_data("GLM", ["admissions_train"])
>>> df = DataFrame("admissions_train")
>>> print(df)
   masters   gpa     stats programming  admitted
id
5       no  3.44    Novice      Novice         0
3       no  3.70    Novice    Beginner         1
1      yes  3.95  Beginner    Beginner         0
20     yes  3.90  Advanced    Advanced         1
8       no  3.60  Beginner    Advanced         1
25      no  3.96  Advanced    Advanced         1
18     yes  3.81  Advanced    Advanced         1
24      no  1.87  Advanced      Novice         1
26     yes  3.57  Advanced    Advanced         1
38     yes  2.65  Advanced    Beginner         1
>>> print(df.shape)
(40, 6)

Example: Run case() with 'whens' Passing 2-tuple

This example shows the whens argument passes a 2-tuple and filters only the 'good' records based on the rating:
  • - gpa > 3.0 : 'good'
  • - 2.0 < gpa <=3.0 : 'average'
  • - gpa <= 2.0 : 'bad'
>>> good_df = df[case([(df.gpa > 3.0, 'good'),
                       (df.gpa > 2.0, 'average')],
                       else_='bad') == 'good']
>>> print(good_df)
   masters   gpa     stats programming  admitted
id
13      no  4.00  Advanced      Novice         1
11      no  3.13  Advanced    Advanced         1
9       no  3.82  Advanced    Advanced         1
26     yes  3.57  Advanced    Advanced         1
3       no  3.70    Novice    Beginner         1
1      yes  3.95  Beginner    Beginner         0
20     yes  3.90  Advanced    Advanced         1
18     yes  3.81  Advanced    Advanced         1
5       no  3.44    Novice      Novice         0
32     yes  3.46  Advanced    Beginner         0
>>> print(good_df.shape)
(35, 6)

Example: With Same Case Construct, Create New DataFrame with Additional 'rating' Column Using 'assign' Operation

>>> whens_df = df.assign(rating = case([(df.gpa > 3.0, 'good'),
                                        (df.gpa > 2.0, 'average')],
                                        else_='bad'))
>>> print(whens_df)
   masters   gpa     stats programming  admitted   rating
id
5       no  3.44    Novice      Novice         0     good
3       no  3.70    Novice    Beginner         1     good
1      yes  3.95  Beginner    Beginner         0     good
20     yes  3.90  Advanced    Advanced         1     good
8       no  3.60  Beginner    Advanced         1     good
25      no  3.96  Advanced    Advanced         1     good
18     yes  3.81  Advanced    Advanced         1     good
24      no  1.87  Advanced      Novice         1      bad
26     yes  3.57  Advanced    Advanced         1     good
38     yes  2.65  Advanced    Beginner         1  average
>>> print(whens_df.shape)
(40, 7)

Example: Run case() without else_ argument

This example runs the case() function without specifying the else_ argument, resulting NULLs when no condition in the whens argument is met.

>>> no_else =  df.assign(rating = case([(df.gpa > 3.0, 'good')]))
>>> print(no_else)
   masters   gpa     stats programming  admitted  rating
id
5       no  3.44    Novice      Novice         0    good
3       no  3.70    Novice    Beginner         1    good
1      yes  3.95  Beginner    Beginner         0    good
20     yes  3.90  Advanced    Advanced         1    good
8       no  3.60  Beginner    Advanced         1    good
25      no  3.96  Advanced    Advanced         1    good
18     yes  3.81  Advanced    Advanced         1    good
24      no  1.87  Advanced      Novice         1    None
26     yes  3.57  Advanced    Advanced         1    good
38     yes  2.65  Advanced    Beginner         1    None
>>> print(no_else.shape)
(40, 7)

Example: Run case() with whens Passing Dictionary with Argument value

This example shows the whens argument passing dictionary along with the argument value. The result is 'admitted' when the df.admitted == 1, 'not admitted' when df.admitted == 0, and 'dont't know' otherwise.

>>> whens_value_df = df.assign(admitted_text = case({ 1 : "admitted", 0 : "not admitted"},
                                                    value=df.admitted,
                                                    else_="don't know"))
>>> print(whens_value_df)
   masters   gpa     stats programming  admitted  admitted_text
id
13      no  4.00  Advanced      Novice         1       admitted
11      no  3.13  Advanced    Advanced         1       admitted
9       no  3.82  Advanced    Advanced         1       admitted
28      no  3.93  Advanced    Advanced         1       admitted
33      no  3.55    Novice      Novice         1       admitted
10      no  3.71  Advanced    Advanced         1       admitted
16      no  3.70  Advanced    Advanced         1       admitted
32     yes  3.46  Advanced    Beginner         0   not admitted
34     yes  3.85  Advanced    Beginner         0   not admitted
17      no  3.83  Advanced    Advanced         1       admitted
>>> print(whens_value_df.shape)
(40, 7)

Example: Run case() Using Literal Column Name as SQL Expressions

This example uses literal column name as SQL expressions and chooses to emit values from two different columns as a result of the same case expression. It shows how you can decide on projecting a column based on the value of expression.

In the example, you project values from column 'average_rating' if 2.0 < gpa <= 3.0 and values from column 'good_rating' when gpa > 3.0, and name the column 'ga_rating'.

  • Load required library.
    >>> from sqlalchemy.sql import literal_column
  • Create a new DataFrame with a new column 'good_rating' which is set to 'good' only when gpa > 3.0, else NULL.
    >>> whens_new_df = df.assign(good_rating = case([(df.gpa > 3.0, 'good')]))
  • Add another column named 'avg_rating' to the DataFrame which is set to 'average' only when 2.0 < gpa <= 3.0, else NULL.
    >>> whens_new_df = whens_new_df.assign(avg_rating = case([((whens_new_df.gpa > 2.0) & (whens_new_df.gpa <= 3.0), 'average')]))
  • Create a DataFrame with another column named 'ga_rating', which is set to the value from column 'good_rating' when gpa > 3.0, and value from column 'avg_rating' when gpa > 2.0, else NULL.
    >>> literal_df = whens_new_df.assign(ga_rating = case([(whens_new_df.gpa > 3.0, literal_column('good_rating')), (whens_new_df.gpa > 2.0, literal_column('avg_rating'))]))
    >>> print(literal_df)
       masters   gpa     stats programming  admitted good_rating   avg_rating ga_rating
    id
    5       no  3.44    Novice      Novice         0        good         None      good
    3       no  3.70    Novice    Beginner         1        good         None      good
    1      yes  3.95  Beginner    Beginner         0        good         None      good
    20     yes  3.90  Advanced    Advanced         1        good         None      good
    8       no  3.60  Beginner    Advanced         1        good         None      good
    25      no  3.96  Advanced    Advanced         1        good         None      good
    18     yes  3.81  Advanced    Advanced         1        good         None      good
    24      no  1.87  Advanced      Novice         1        None         None      None
    26     yes  3.57  Advanced    Advanced         1        good         None      good
    38     yes  2.65  Advanced    Beginner         1        None      average   average