Use the groupby() method to group one or more columns of a DataFrame.
The method takes a column name or a list of column names to group by.
- You can still apply teradataml DataFrame methods (filters/sort/etc) on top of the result of this one.
- Consecutive operations of grouping, i.e., groupby_time(), resample() and groupby() are not permitted. An exception will be raised. Following are some cases where exception will be raised as "Invalid operation applied, check documentation for correct usage."
- df.groupby().groupby()
- df.groupby().resample()
- df.groupby().groupby_time()
Examples Prerequisite
Assume a teradata DataFrame "df" is created based on the table "admissions_train", using the command:
>>> df = DataFrame("admissions_train")
>>> df masters gpa stats programming admitted id 13 no 4.00 Advanced Novice 1 26 yes 3.57 Advanced Advanced 1 5 no 3.44 Novice Novice 0 19 yes 1.98 Advanced Advanced 0 15 yes 4.00 Advanced Advanced 1 40 yes 3.95 Novice Beginner 0 7 yes 2.33 Novice Novice 1 22 yes 3.46 Novice Beginner 0 36 no 3.00 Advanced Novice 0 38 yes 2.65 Advanced Beginner 1
Example 1: Groups by one column and finds the min
The following example groups by column "masters" and finds the min for the groups in "masters":
>>> df2 = df.groupby("masters")
>>> df2.min() masters min_id min_gpa min_stats min_programming min_admitted 0 no 3 1.87 advanced advanced 0 1 yes 1 1.98 advanced advanced 0
Example 2: Groups by two columns and finds the min and max
This example finds the min and max grouped by "masters" and "programming":
>>> df3 = df.groupby(["masters", "programming"])
>>> df3.min() masters programming min_id min_gpa min_stats min_admitted 0 no Advanced 8 3.13 Advanced 1 1 yes Beginner 1 2.65 Advanced 0 2 yes Novice 4 2.33 Advanced 0 3 no Beginner 3 3.68 Novice 1 4 yes Advanced 6 1.98 Advanced 0 5 no Novice 5 1.87 Advanced 0
>>> df3.max() masters programming max_id max_gpa max_stats max_admitted 0 no Novice 37 4.00 Novice 1 1 no Advanced 28 3.96 Beginner 1 2 no Beginner 35 3.87 Novice 1 3 yes Advanced 27 4.00 Beginner 1 4 yes Novice 30 3.79 Novice 1 5 yes Beginner 40 4.00 Novice 1
Example 3: Select multiple columns, followed by groupby and find the min
This example selects "id", "masters", "gpa", "stats", followed by a groupby on "masters", and finds the min.
>>> df1 = df.select(["id", "masters", "gpa", "stats"])
>>> df2 = df1.groupby("masters")
>>> df2.min() masters min_id min_gpa min_stats 0 no 3 1.87 advanced 1 yes 1 1.98 advanced