Access teradataml DataFrame Column - Teradata Python Package

Teradata® Python Package User Guide

Product
Teradata Python Package
Release Number
16.20
Published
February 2020
Language
English (United States)
Last Update
2020-02-29
dita:mapPath
rkb1531260709148.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage

In order to use teradataml DataFrame Column, also known as ColumnExpression, in various ways in filter, assign or join, you must access the Column.

There are two ways to access column:
  • Access column as DataFrame attribute:

    <dataframe_object>.column_name

  • Access column like dictionary:

    <dataframe_object>["column_name"]

If column name contains whitespace or special character, Teradata recommends accessing ColumnExpression like dictionary.

Example Prerequisites

>>> from teradataml.dataframe.sql_functions import case
>>> load_example_data("GLM", ["admissions_train"])
>>> df = DataFrame("admissions_train")
>>> print(df)
   masters   gpa     stats programming  admitted
id
5       no  3.44    Novice      Novice         0
3       no  3.70    Novice    Beginner         1
1      yes  3.95  Beginner    Beginner         0
20     yes  3.90  Advanced    Advanced         1
8       no  3.60  Beginner    Advanced         1
25      no  3.96  Advanced    Advanced         1
18     yes  3.81  Advanced    Advanced         1
24      no  1.87  Advanced      Novice         1
26     yes  3.57  Advanced    Advanced         1
38     yes  2.65  Advanced    Beginner         1

Example: Access ColumnExpression as attribute and use the same as predicate for filter

>>> gpa = df.gpa
>>> good_df = df[case([(gpa > 3.0, 'good'),
                       (gpa > 2.0, 'average')],
                       else_='bad') == 'good']
>>> print(good_df)
   masters   gpa     stats programming  admitted
id
13      no  4.00  Advanced      Novice         1
11      no  3.13  Advanced    Advanced         1
9       no  3.82  Advanced    Advanced         1
26     yes  3.57  Advanced    Advanced         1
3       no  3.70    Novice    Beginner         1
1      yes  3.95  Beginner    Beginner         0
20     yes  3.90  Advanced    Advanced         1
18     yes  3.81  Advanced    Advanced         1
5       no  3.44    Novice      Novice         0
32     yes  3.46  Advanced    Beginner         0
>>> print(good_df.shape)
(35, 6)

Example: Access ColumnExpression like dictionary and use the same to create a new DataFrame

This example accesses ColumnExpression like dictionary and uses the same to create a new DataFrame with an additional 'rating' column using assign operation, with the same case construct used above.

>>> gpa = df['gpa']
>>> whens_df = df.assign(rating = case([(gpa > 3.0, 'good'),
                                        (gpa > 2.0, 'average')],
                                        else_='bad'))
>>> print(whens_df)
   masters   gpa     stats programming  admitted   rating
id
5       no  3.44    Novice      Novice         0     good
3       no  3.70    Novice    Beginner         1     good
1      yes  3.95  Beginner    Beginner         0     good
20     yes  3.90  Advanced    Advanced         1     good
8       no  3.60  Beginner    Advanced         1     good
25      no  3.96  Advanced    Advanced         1     good
18     yes  3.81  Advanced    Advanced         1     good
24      no  1.87  Advanced      Novice         1      bad
26     yes  3.57  Advanced    Advanced         1     good
38     yes  2.65  Advanced    Beginner         1  average
>>> print(whens_df.shape)
(40, 7)