groupby() Method | Teradata Python Package - 17.00 - groupby() Method - Teradata Package for Python

Teradata® Package for Python User Guide

Product
Teradata Package for Python
Release Number
17.00
Release Date
November 2021
Content Type
User Guide
Publication ID
B700-4006-070K
Language
English (United States)

Use the groupby() method to group one or more columns of a DataFrame.

The method takes a column name or a list of column names to group by.

  • You can still apply teradataml DataFrame methods (filters/sort/etc) on top of the result of this one.
  • Consecutive operations of grouping, i.e., groupby_time(), resample() and groupby() are not permitted. An exception will be raised. Following are some cases where exception will be raised as "Invalid operation applied, check documentation for correct usage."
    • df.groupby().groupby()
    • df.groupby().resample()
    • df.groupby().groupby_time()

Examples Prerequisite

Assume a teradata DataFrame "df" is created based on the table "admissions_train", using the command:

>>> df = DataFrame("admissions_train")
>>> df
   masters   gpa     stats programming admitted
id                                            
13      no  4.00  Advanced      Novice        1
26     yes  3.57  Advanced    Advanced        1
5       no  3.44    Novice      Novice        0
19     yes  1.98  Advanced    Advanced        0
15     yes  4.00  Advanced    Advanced        1
40     yes  3.95    Novice    Beginner        0
7      yes  2.33    Novice      Novice        1
22     yes  3.46    Novice    Beginner        0
36      no  3.00  Advanced      Novice        0
38     yes  2.65  Advanced    Beginner        1

Example 1: Groups by one column and finds the min

The following example groups by column "masters" and finds the min for the groups in "masters":

>>> df2 = df.groupby("masters")
>>> df2.min()
  masters min_id  min_gpa min_stats min_programming min_admitted
0      no      3     1.87  advanced        advanced            0
1     yes      1     1.98  advanced        advanced            0

Example 2: Groups by two columns and finds the min and max

This example finds the min and max grouped by "masters" and "programming":

>>> df3 = df.groupby(["masters", "programming"])
>>> df3.min()
  masters programming min_id  min_gpa min_stats min_admitted
0      no    Advanced      8     3.13  Advanced            1
1     yes    Beginner      1     2.65  Advanced            0
2     yes      Novice      4     2.33  Advanced            0
3      no    Beginner      3     3.68    Novice            1
4     yes    Advanced      6     1.98  Advanced            0
5      no      Novice      5     1.87  Advanced            0
>>> df3.max()
  masters programming max_id  max_gpa max_stats max_admitted
0      no      Novice     37     4.00    Novice            1
1      no    Advanced     28     3.96  Beginner            1
2      no    Beginner     35     3.87    Novice            1
3     yes    Advanced     27     4.00  Beginner            1
4     yes      Novice     30     3.79    Novice            1
5     yes    Beginner     40     4.00    Novice            1

Example 3: Select multiple columns, followed by groupby and find the min

This example selects "id", "masters", "gpa", "stats", followed by a groupby on "masters", and finds the min.

>>> df1 = df.select(["id", "masters", "gpa", "stats"])
>>> df2 = df1.groupby("masters")
>>> df2.min()
  masters min_id  min_gpa min_stats
0      no      3     1.87  advanced
1     yes      1     1.98  advanced