groupby() Method

Teradata® Python Package User Guide

brand
Teradata Vantage
prodname
Teradata Python Package
vrm_release
16.20
category
User Guide
featnum
B700-4006-098K

Use the groupby() method to group one or more columns for a DataFrame.

The method takes a column name or a list of column names to group by.

Examples Prerequisite

Assume a teradata DataFrame "df" is created based on the table "admissions_train", using the command:

>>> df = DataFrame("admissions_train")
>>> df
   masters   gpa     stats programming admitted
id
5       no  3.44    novice      novice        0
3       no  3.70    novice    beginner        1
1      yes  3.95  beginner    beginner        0
17      no  3.83  advanced    advanced        1
13      no  4.00  advanced      novice        1
32     yes  3.46  advanced    beginner        0
11      no  3.13  advanced    advanced        1
9       no  3.82  advanced    advanced        1
19     yes  1.98  advanced    advanced        0
36      no  3.00  advanced      novice        0
15     yes  4.00  advanced    advanced        1
14     yes  3.45  advanced    advanced        0
31     yes  3.50  advanced    beginner        1
40     yes  3.95    novice    beginner        0
7      yes  2.33    novice      novice        1
22     yes  3.46    novice    beginner        0
39     yes  3.75  advanced    beginner        0
37      no  3.52    novice      novice        1
35      no  3.68    novice    beginner        1
30     yes  3.79  advanced      novice        0

Example: Groups by one column and finds the min

The following example groups by column "masters" and finds the min for the groups in "masters":

>>> df2 = df.groupby("masters")
>>> df2.min()
  masters min_id  min_gpa min_stats min_programming min_admitted
0      no      3     1.87  advanced        advanced            0
1     yes      1     1.98  advanced        advanced            0

Example: Groups by two columns and finds the min and max

This example finds the min and max grouped by "masters" and "programming":

>>> df3 = df.groupby(["masters", "programming"])
>>> df3.min()
  programming masters min_id  min_gpa min_stats min_admitted
0    beginner     yes      1     2.65  advanced            0
1      novice     yes      4     2.33  advanced            0
2    beginner      no      3     3.68    novice            1
3    advanced      no      8     3.13  advanced            1
4    advanced     yes      6     1.98  advanced            0
5      novice      no      5     1.87  advanced            0
>>> df3.max()
  programming masters max_id  max_gpa max_stats max_admitted
0      novice     yes     30     3.79    novice            1
1      novice      no     37     4.00    novice            1
2    advanced     yes     27     4.00  beginner            1
3    beginner      no     35     3.87    novice            1
4    advanced      no     28     3.96  beginner            1
5    beginner     yes     40     4.00    novice            1

Example: Select multiple columns, followed by groupby and find the min

This example selects "id", "masters", "gpa", "stats", followed by a groupby on "masters", and finds the min.

>>> df1 = df.select(["id", "masters", "gpa", "stats"])
>>> df2 = df1.groupby("masters")
>>> df2.min()
  masters min_id  min_gpa min_stats
0      no      3     1.87  advanced
1     yes      1     1.98  advanced