groupby() Method | Teradata Package for Python - groupby() Method - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
VMware
Enterprise
IntelliFlex
Product
Teradata Package for Python
Release Number
20.00
Published
March 2025
ft:locale
en-US
ft:lastEdition
2026-01-07
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

Use the groupby() method to group one or more columns of a DataFrame.

This method does not support operations on array columns.

The method takes a column name or a list of column names to group by.

  • You can still apply teradataml DataFrame methods (filters/sort/etc) on top of the result of this one.
  • Consecutive operations of grouping, i.e., groupby_time(), resample() and groupby() are not permitted. An exception will be raised. Following are some cases where exception will be raised as "Invalid operation applied, check documentation for correct usage."
    • df.groupby().groupby()
    • df.groupby().resample()
    • df.groupby().groupby_time()

Required Parameter

column_expr
Specifies the column names to group by.

Optional Parameter

kwargs
Specifies keyword arguments.
option
Specifies the groupby option.

Permitted values: "CUBE", "ROLLUP", None.

include_grouping_columns
Specifies whether to include aggregations on the grouping columns or not. When set to True, the resultant DataFrame will have the aggregations on the columns mentioned in "columns_expr". Otherwise, resultant DataFrame will not have aggregations on the columns mentioned in "columns_expr".

Default value: False

Example setup

Assume a teradata DataFrame "df" is created based on the table "admissions_train", using the command:

>>> df = DataFrame("admissions_train")
>>> df
   masters   gpa     stats programming admitted
id                                            
13      no  4.00  Advanced      Novice        1
26     yes  3.57  Advanced    Advanced        1
5       no  3.44    Novice      Novice        0
19     yes  1.98  Advanced    Advanced        0
15     yes  4.00  Advanced    Advanced        1
40     yes  3.95    Novice    Beginner        0
7      yes  2.33    Novice      Novice        1
22     yes  3.46    Novice    Beginner        0
36      no  3.00  Advanced      Novice        0
38     yes  2.65  Advanced    Beginner        1

Example 1: Groups by one column and finds the min

The following example groups by column "masters" and finds the min for the groups in "masters":

>>> df2 = df.groupby("masters")
>>> df2.min()
  masters min_id  min_gpa min_stats min_programming min_admitted
0      no      3     1.87  advanced        advanced            0
1     yes      1     1.98  advanced        advanced            0

Example 2: Groups by two columns and finds the min and max

This example finds the min and max grouped by "masters" and "programming":

>>> df3 = df.groupby(["masters", "programming"])
>>> df3.min()
  masters programming min_id  min_gpa min_stats min_admitted
0      no    Advanced      8     3.13  Advanced            1
1     yes    Beginner      1     2.65  Advanced            0
2     yes      Novice      4     2.33  Advanced            0
3      no    Beginner      3     3.68    Novice            1
4     yes    Advanced      6     1.98  Advanced            0
5      no      Novice      5     1.87  Advanced            0
>>> df3.max()
  masters programming max_id  max_gpa max_stats max_admitted
0      no      Novice     37     4.00    Novice            1
1      no    Advanced     28     3.96  Beginner            1
2      no    Beginner     35     3.87    Novice            1
3     yes    Advanced     27     4.00  Beginner            1
4     yes      Novice     30     3.79    Novice            1
5     yes    Beginner     40     4.00    Novice            1

Example 3: Select multiple columns, followed by groupby and find the min

This example selects "id", "masters", "gpa", "stats", followed by a groupby on "masters", and finds the min.

>>> df1 = df.select(["id", "masters", "gpa", "stats"])
>>> df2 = df1.groupby("masters")
>>> df2.min()
  masters min_id  min_gpa min_stats
0      no      3     1.87  advanced
1     yes      1     1.98  advanced