GROUP BY Clause

Purpose

Groups result rows by the values in one or more columns or by various extended GROUP BY operations on specified column expressions.

Syntax

where:

Syntax Element …	Specifies …
GROUP BY	a reference to one or more expressions in the select expression list.
ordinary_grouping_set	a column expression by which the rows returned by the statement are grouped. You cannot specify BLOB, CLOB, Period, ARRAY, VARRAY, XML, or JSON columns in the grouping expression. ordinary_grouping_set falls into three general categories. column_name column_position column_expression See “Ordinary Grouping Set Expressions” on page 146 for the definitions of these expressions.
empty_grouping _set	a contiguous LEFT PARENTHESIS, RIGHT PARENTHESIS pair without an argument. This syntax is used to request a grand total. The term grand total here refers to a summation of all the individual group totals, not a summation of the nonaggregate data.
rollup_list	a ROLLUP expression that reports result rows in a single dimension with one or more levels of detail. See “ROLLUP Grouping Set Option” on page 160 for more information. The expression cannot group result rows that have a BLOB, CLOB, ARRAY, or VARRAY type.
cube_list	a CUBE expression that reports result rows in multiple dimensions with one or more levels of detail. See “CUBE Grouping Set Option” on page 154 for more information. The expression cannot group result rows that have a BLOB, CLOB, ARRAY, or VARRAY type.
grouping_sets_ specification	a GROUPING SETS expression that reports result rows in one of two ways: As a single dimension, but without a full ROLLUP. As multiple dimensions, but without a full CUBE. See “GROUPING SETS Option” on page 157 for more information.

ANSI Compliance

The GROUP BY clause is ANSI SQL:2011-compliant with extensions.

GROUP BY Clause Terminology

A GROUP BY clause is said to be simple if it does not contain any of the following elements:

rollup_list

cube_list

grouping_sets_specification

A GROUP BY clause is said to be primitive if it does not contain any of the following elements:

rollup_list

cube_list

grouping_sets_specification

a grouping_expression and an empty_grouping_set.

See “GROUPING SETS Option” on page 157 for definitions of rollup_list, cube_list, grouping_sets_specification, and empty_grouping_set.

See the following topics for examples of non‑simple, non‑primitive GROUP BY clauses:

“ROLLUP Grouping Set Option” on page 160

“CUBE Grouping Set Option” on page 154

“GROUPING SETS Option” on page 157

Ordinary Grouping Set Expressions

The following table provides the definitions for the valid ordinary grouping set expressions you can specify in a GROUP BY clause.

Ordinary Grouping Set Expression	Definition
column_name	a set of column names drawn from the list of tables specified in the FROM clause of the SELECT statement that is used in the GROUP BY clause to specify the columns by which data is to be grouped. The maximum number of columns you can specify is 64. You cannot include LOB columns in the grouping expression. You can specify a column_name_alias instead of column_name as long as it does not have the same name as a physical column in the table definition. In this case, you must specify column_position, not column_name_alias.
column_position	the sequential numeric position of columns within the column_list clause of the SELECT statement that is used in the GROUP BY clause to specify the order by which data is to be grouped. The value you specify must be a positive constant integer literal with a value between 1 and the number of columns specified in the select list, inclusive. Note that Teradata Database treats macro and procedure parameters as expressions, not as the specification of a column position. You cannot include LOB columns in the grouping expression. This is a Teradata extension to the ANSI SQL:2011 standard.
column_expression	any list of valid SQL expressions specified for the GROUP BY clause. You can specify column_name, column_position, and expression either as single entries or as a list. You can specify a scalar subquery as an ordinary grouping set expression. You can also specify a scalar UDF as an ordinary grouping set expression. You cannot include LOB columns in the ordinary grouping set. Use of column_expression is a Teradata extension to the ANSI SQL:2011 standard.

Using the GROUP BY Clause

The rules and restrictions are:

When an aggregate operation (for example, SUM, AVERAGE, MAX, MIN, or COUNT) is specified, GROUP BY can be used to return a summary row for each group.

Aggregate operators can be used only in the SELECT expression list or in the optional HAVING clause.

Nonaggregated variables in SELECT, ORDER BY, and HAVING need to appear in the group by list.

All nonaggregate groups in a SELECT expression list or HAVING expression list must be included in the GROUP BY clause.

You can specify a correlated subquery using a scalar UDF in the same way that you would specify a column name or parameterized value in the GROUP BY clause.

For built‑in ordered analytic functions specific to Teradata Database, such as CSUM and MAVG, GROUP BY determines the partitions over which the function executes. For an example, see “Example 3: Specifying GROUP BY Using an Ordered Analytic Function” on page 150.

For window functions, such as SUM and AVG, GROUP BY collapses all rows with the same value for the group-by columns into a single row. The GROUP BY clause must include all the columns specified in the:

Select list of the SELECT clause

Window functions in the select list of a SELECT clause

Window functions in the search condition of a QUALIFY clause

The condition in a RESET WHEN clause

For examples and more information on GROUP BY and ordered analytical functions, see SQL Functions, Operators, Expressions, and Predicates.

You cannot specify BLOB or CLOB columns in a grouping expression, rollup list, cube list, or grouping sets specification.

You cannot specify a GROUP BY clause in a recursive statement of a recursive query. However, you can specify a GROUP BY clause in a nonrecursive seed statement in a recursive query.

If you specify an ORDER BY clause, then any group contained in the ORDER BY clause must also be included in the GROUP BY clause.

WHERE, GROUP BY, and HAVING clauses in a SELECT statement are evaluated in this order:

a WHERE

b GROUP BY

c HAVING

Reason for Unexpected Row Length Errors: Sorting Rows for Grouping

Before performing the sort operation that groups the rows to be returned to the requestor, Teradata Database creates a sort key and appends it to the rows to be sorted. If the length of this temporary data structure exceeds the system row length limit of 64 KB, the operation returns an error to the requestor. Depending on the situation, the message text is one of the following:

A data row is too long.

Maximum row length exceeded in database_object_name.

For explanations of these messages, see Messages.

The BYNET only looks at the first 4096 bytes of the sort key created to sort the specified fields, so if the field the sort key is based on is greater than 4096 bytes, the key is truncated and the data might or might not come back in the desired order.

Examples

These examples show the use of the simple or primitive GROUP BY clause.

Examples for the ROLLUP, CUBE, and GROUPING SETS extended GROUP BY operators are provided in “ROLLUP Grouping Set Option” on page 160, “CUBE Grouping Set Option” on page 154, and “GROUPING SETS Option” on page 157, respectively.

Example : Simple GROUP BY Operation

Generate a report of salary totals by department, the result might look something like the report that follows.

     SELECT dept_no, SUM(salary)

     FROM employee

     GROUP BY dept_no;

The result might be similar to the report that follows.

`dept_no` `-------`	`SUM(salary)` `-----------`
`100`	`180,500.00`
`300`	`143,000.00`
`500`	`268,000.00`
`600`	`146,600.00`
`700`	`113,000.00`

Example : Specifying a GROUP BY Clause on Nonaggregate Expressions When the Select List Includes an Aggregate

If you specify an aggregate in the select expression list of a query, then you must also specify a GROUP BY clause that includes all nonaggregate expressions from the select list. Otherwise, the system returns the following message.

     Selected non‑aggregate values must be part of the associated group.

The system returns error message 3504 whenever an aggregate query includes a nonaggregate expression in its SELECT list, WHERE clause, ORDER BY clause, or HAVING clause, but not in a GROUP BY clause.

The system also returns this error when ORDER BY and WITH clauses specify aggregates, but the select list for the query does not.

For example, Teradata Database aborts the following query because it does not specify a GROUP BY clause that groups on the only nonaggregate expression in the select list, which is department_number:

     SELECT department_number, SUM(salary_amount)

     FROM employee

     WHERE department_number IN (100, 200, 300);

To work as intended, the query must be rewritten with an appropriate GROUP BY clause:

     SELECT department_number, SUM(salary_amount)

     FROM employee

     WHERE department_number IN (100, 200, 300)

     GROUP BY department_number;

The following statement aborts and returns an error because it does not specify all of the nonaggregate columns from the select list in its GROUP BY clause.

     SELECT employee.dept_no, department.dept_name, AVG(salary)

     FROM employee, department

     WHERE employee.dept_no = department.dept_no

     GROUP BY employee.dept_no;

In this case, the qualified nonaggregate column department.dept_name is missing from the GROUP BY clause.

The following statement aborts and returns an error because the nonaggregate grouping column specified in the ORDER BY clause is not also specified in the GROUP BY clause.

     SELECT employee.dept_no, AVG(salary)

     FROM employee, department

     WHERE employee.dept_no = department.dept_no

     ORDER BY department.dept_name

     GROUP BY employee.dept_no;

The following statement, based on the table definitions shown, aborts and returns an error because the aggregate query includes a nonaggregate expression, d1, in its HAVING clause, but not in the GROUP BY clause.

     CREATE TABLE t1(a1 int, b1 int, c1 int, d1 int);

     CREATE TABLE t2(a2 int, b2 int, c2 int, d2 int);

     SELECT min(a1) as i, max(b1) as j from t1

     GROUP BY c1

     HAVING 30 >= (sel count(*) from t2 where t1.d1=5);

The correct form of the query includes the nonaggregate expression, d1, in its HAVING clause and in the GROUP BY clause.

     SELECT min(a1) as i, max(b1) as j from t1

     GROUP BY c1, d1

     HAVING 30 >= (sel count(*) from t2 where t1.d1=5);

Example : Specifying GROUP BY Using an Ordered Analytic Function

The following statement specifies a GROUP BY clause with an ordered analytical function to generate report breaks where the function resets and computes a new value for the next grouping.

The example groups all items into percentile by profitability for each store and then returns only the items of interest, which, in this case, are the lowest percentile for each store.

     SELECT store, item, profit, QUANTILE(100, profit) AS percentile

     FROM (SELECT items.item, SUM(sales) -

                  (COUNT(sales)*items.item_cost) AS profit

           FROM daily_sales, items

           WHERE daily_sales.item = items.item

           GROUP BY items.item,items.itemcost) AS item_profit

     GROUP BY store, item, profit, percentile

     QUALIFY percentile = 99;

The result of this query looks like the following table:

store -----	item ----	profit ------	percentile ----------
Eastside	Golf balls	100.19	99
Westside	Tennis balls	-110.00	99
Central	Bowling balls	-986.81	99
South	Codfish balls	- 1,891.89	99
North	Bocce balls	1,180.88	99

Example : SELECT Statement With a Scalar Subquery in Its GROUP BY Clause

The following example specifies a scalar subquery in its GROUP BY clause:

     SELECT sale_date, SUM(amount)

     FROM sales_table AS s

     GROUP BY sale_date, (SELECT prod_name

                          FROM prod_table AS p

                          WHERE p.prod_no = s.prod_no);

Example : GROUP BY and PERIOD Value Expressions

The following example shows how the GROUP BY clause operates on PERIOD value expressions, where period_of_stay is the PERIOD value expression.

     SELECT emp_no, period_of_stay

     FROM employee

     GROUP BY emp_no, period_of_stay;

For More Information

For more information related to the GROUP BY clause, see:

“Resolving Extended GROUP BY Operator Specifications” on page 152

“CUBE Grouping Set Option” on page 154

“GROUPING SETS Option” on page 157

“ROLLUP Grouping Set Option” on page 160

“ORDER BY Clause” on page 221