# Partial GROUP BY Block Optimization | Join Planning/Optimization | Vantage - 17.10 - Partial GROUP BY Block Optimization - Advanced SQL Engine - Teradata Database

## Teradata Vantageā¢ - SQL Request and Transaction Processing

Product
Release Number
17.10
Release Date
July 2021
Content Type
Programming Reference
User Guide
Publication ID
B035-1142-171K
Language
English (United States)

## About Partial GROUP BY Block Optimization by the Optimizer

Where appropriate, aggregations over joins are split, with part being done before the join, and part being done after the join. This means less work for the join as well as less aggregation work after the join.

In a join query plan for a query with a GROUP BY (or DISTINCT) clause, Partial GROUP BY (PGB) operations are used to reduce the cardinalities of base tables, join indexes, and intermediate spools, thereby reducing processing required in performing the join query plan. To further enhance efficiency, certain of the partial group operations, including an intermediate Partial GROUP BY operation, a final GROUP BY operation, or both can be eliminated in certain predefined conditions.

The Partial GROUP BY optimization can be done almost at any stage of the join plan. An early GROUP BY reduces the rows of the base relations or the intermediate join relations during the optimization process. However, GROUP BY and merge join operators have a tendency to compete with one other for use in the lowest cost plan because both apply significant row reduction. Therefore, it is extremely important for the Optimizer to cost the alternatives.

The Partial GROUP BY optimization can reduce the processing and I/O times for typical requests. The more duplicate grouping values a table has, the greater the reduction in I/O processing which, in some cases, can be as much as a factor of 100 times.

## Examples of Partial GROUP BY Optimization

Consider the following typical query:

```SELECT b1, c1, SUM(t1.float_1), SUM(t2.float_2), SUM(t3.float_3)
FROM t1,t2,t3
WHERE b1=b2 AND b2=b3
GROUP BY b1,c1;```
Without Partial GROUP BY enabled, the Optimizer selects one of the following join plans:
• [(t1 X t2) X t3 ]'
• [(t2 X t3) X t1]'
• [(t1 X t3) X t2]'

where X represents the join operator and [ ]' denotes the GROUP BY operator applied to the relation.

With Partial GROUP BY enabled, some additional join plans are considered, such as the following:
• [([t2]' X t3) X t1]'
• [(t1 X [t3]') X t2]'
• [([t2]' X [t3]') X t1]]'
• [([t1 X t3]' X [t2]'
• ...

This provides the Optimizer with an opportunity to find a lower cost join plan among a larger set of possible join plans. Even though there are more join plans to be considered, the additional search effort consumes negligible CPU time and returns a significant reduction time in the execution of the request.

Consider the following example:

```SELECT ps_partkey, SUM(l_quantity), SUM (ps_supplycost)
FROM partsupp, lineitem
WHERE l_partkey = ps_partkey
GROUP BY 1;```

Without Partial GROUP BY enabled, the Optimizer redistributes lineitem on the join column l_partkey, performs the join with partsupp, and then aggregates the result of the join. With a one terabyte database, this redistributes 6 billion rows and performs the join with partsupp to produce a result set of 6 billion rows on which the aggregation is performed.

With Partial GROUP BY enabled, the Optimizer first separately aggregates the relations partsupp and lineitem and then joins the results of those aggregations.

Without the transformation, approximately 85 billion rows must be read and written. This is largely because the join of the 6 billion row lineitem relation is joined to the 800 million row partsupp relation to produce a 24 billion row join result spool for aggregation.

With the transformation, the lineitem relation is aggregated as part of the sort and redistribution operation to produce a 200 million row spool. The partsupp relation is locally aggregated to produce another 200 million row spool. The two spools are then joined to produce a 200 million row result, and there is an overall reduction of about 3 times in the number of rows read and written.

## Identifying Partial GROUP BY Operations in EXPLAIN Report Text

They are two ways to coalesce rows using Partial GROUP BY:
• Partial SUM steps
• Sort/Group steps

## Early GROUP BY With a Partial SUM Step

This section demonstrates the EXPLAIN phrase partial SUM.

Consider the following query:

```SELECT l_partkey,SUM(l_quantity),SUM(ps_supplycost)
FROM partsupp, lineitem
WHERE l_partkey = ps_partkey
GROUP BY 1;```

Without Partial GROUP BY enabled, the join plan would be [partsupp X lineitem]'. With Partial GROUP BY enabled, the join plan is [partsupp]' X [lineitem]'.

The following EXPLAIN text compares two explanations of this query: the first with Partial GROUP BY disabled and the second with Partial GROUP BY enabled.

This is partial EXPLAIN text with Partial GROUP BY disabled.

```...
4) We do an all-AMPs RETRIEVE step from TPCD50G.lineitem by way of an all-rows scan
with no residual conditions into Spool 4 (all_amps), which is redistributed by hash
code to all AMPs. Then we do a SORT to order Spool 4 by row hash. The result spool
file will not be cached in memory. The size of Spool 4 is estimated with high
confidence to be 300,005,811 rows. The estimated time for this step is 2 hours and
51 minutes.
5) We do an all-AMPs JOIN step from TPCD50G.partsupp by way of a RowHash match scan with
no residual conditions, which is joined to Spool 4 (Last Use). TPCD50G.partsupp and
Spool 4 are joined using a merge join, with a join condition of
("L_PARTKEY = TPCD50G.partsupp.PS_PARTKEY").
The input table TPCD50G.partsupp will not be cached in memory, but it is eligible for
synchronized scanning. The result goes into Spool 3 (all_amps), which is built-
locally on the AMPs. The result spool file will not be cached in memory.
The size of Spool 3 is estimated with low confidence to be 1,200,023,244 rows.
The estimated time for this step is 2 hours and 46 minutes.
6) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by way of an all-rows
scan, and the grouping identifier in field 2. Aggregate Intermediate Results are
computed locally, then placed in Spool 5. The aggregate spool file will not be
cached in memory. The size of Spool 5 is estimated with low confidence to be
1,200,023,244 rows. The estimated time for this step is 10 hours and 6 minutes.```

This is partial EXPLAIN text with Partial GROUP BY enabled.

```...
4) We do an all-AMPs partial SUM step to aggregate from TPCD50G.partsupp by way of
an all-rows scan with no residual conditions, and the grouping identifier in field 1.
Aggregate Intermediate Results are computed locally, then placed in Spool 6.
The input table will not be cached in memory, but it is eligible for synchronized
scanning.  The aggregate spool file will not be cached in memory.  The size of Spool 6
is estimated with low confidence to be 10,000,000 rows.  The estimated time for
this step is 12 minutes and 36 seconds.
5) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Spool 6 (Last Use) by way of an all-rows scan
into Spool 5 (all_amps), which is built locally on the AMPs.  Then we do a SORT
to order Spool 5 by row hash.  The result spool file will not be cached in
memory. The size of Spool 5 is estimated with low confidence to be
10,000,000 rows.
2) We do an all-AMPs partial SUM step to aggregate from TPCD50G.lineitem by
way of an all-rows scan with no residual conditions, and the grouping identifier
in field 1.  Aggregate Intermediate Results are computed globally, then placed
in Spool 10.  The input table will not be cached in memory, but it is eligible
for synchronized scanning. The size of Spool 10 is estimated with low confidence
to be 10,000,000 rows. The estimated time for this step is is 3 hours and
39 minutes.
6) We do an all-AMPs RETRIEVE step from Spool 10 (Last Use) by way an all-rows scan
into Spool 9 (all_amps), which is redistributed by hash code to all AMPs.  Then we do
a SORT to oder SPOOL 9 by row hash.  The result spool file will not be cached in
memory. The size of Spool 9 is estimated with no confidence to be 10,000,000 rows.
7) We do an all-AMPs JOIN step from Spool 5 (Last Use) by way of a RowHash match scan,
which is joined to Spool 9 (Last Use).  Spool 5 and Spool 9 are joined using a
merge join, with a join condition of ("L_PARTKEY = PS_PASRTKEY").
The result goes into Spool 3, (all_amps), which is built locally on the AMPs.
The result spool file will not be cached in memory.  The size of Spool 3 is
estimated with low confidence to be 993,739,248 rows.  The estimated time for this
step is 2 hours and 16 minutes.
```

With Partial GROUP BY disabled, the SUM step is performed in step 6 where the cardinalities are reduced. In contrast, when Partial GROUP BY is enabled, the partial SUM step is performed in steps 4 and 5.1.

Because the Partial GROUP BY optimization is performed before the join, the EXPLAIN text adds the phrase partial before the SUM step for easy identification of the optimization.

Note that it is possible for the Optimizer to select a join plan that does not use this optimization, even though Partial GROUP BY is enabled. This might happen, for example, if the cost of aggregation makes it not worth doing because of the number of small groups.

## SORT/GROUP Step and Partial GROUP BY Optimization

This section introduces the phrase SORT/GROUP as an identification of GROUP BY. The following request has a nested subquery, with the inner query specifying an OUTER JOIN. Both the inner and outer queries have GROUP BY clauses.

```SELECT c_count, COUNT(*) AS custdist
FROM (SELECT c_custkey, COUNT(o_orderkey)
FROM customer LEFT OUTER JOIN ordertbl
ON  c_custkey = o_custkey
AND o_comment NOT LIKE '%special%requests%'
GROUP BY c_custkey) AS c_orders (c_custkey, c_count)
GROUP BY c_count
ORDER BY custdist desc, c_count DESC;```

This is partial EXPLAIN text with Partial GROUP BY enabled.

```...
4) We do an all-AMPs RETRIEVE step from TPCD50G.ORDERTBL by way of an all-rows scan
with a condition of ("NOT(TPCD50G.ORDERTBL.O_COMMENT LIKE '%special%requests%')")
into Spool 5 (all_amps), which is redistributed by hash code to all AMPs.
Then we do a SORT/GROUP to order Spool 5 by row hash and non-aggregate
fields grouping duplicate rows. The result spool file will not be cached in memory.
The size of Spool 5 is estimated with no confidence to be 67,500,000 rows.
5) We do an all-AMPs JOIN step from TPCD50G.CUSTOMER by way of a RowHash match scan
with no residual conditions, which is joined to Spool 5 (Last Use).
TPCD50G.CUSTOMER and Spool 5 are left outer joined using a merge join,
with a join condition of ("TPCD50G.CUSTOMER.C_CUSTKEY = O_CUSTKEY").
The input table TPCD50G.CUSTOMER will not be cached in memory. The result goes
into Spool 3 (all_amps), which is built locally on the AMPs. The result spool file
will not be cached in memory. The size of Spool 3 is estimated with low confidence
to be 74,954,952 rows. The estimated time for this step is 15 minutes and
32 seconds.
6) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan
into Spool 1 (all_amps), which is built locally on the AMPs. The result spool file
will not be cached in memory. The size of Spool 1 is estimated with low confidence
to be 74,954,952 rows. The estimated time for this step is 11 minutes and 6 seconds.
7) We do a SUM step to aggregate from Spool 1 (Last Use) by way of an all-rows scan,
and the grouping identifier in field 3. Aggregate Intermediate Results are computed
globally, then placed in Spool 10. The size of Spool 10 is estimated with no
confidence to be 8,658 rows. The estimated time for this step is 1 minute and
26 seconds.```

In this explanation, the phrase SORT/GROUP appears in Step 4. That means the Partial GROUP BY is used early to reduce cardinalities.

## Eliminating the Last Partial GROUP BY Aggregation for Better Performance

The following are examples of join plans that do not require a final GROUP BY operation.

Assume you submit the following query for the TPC-H benchmark.

```SELECT l_suppkey,SUM(l_quantity),SUM(ps_availqty)
FROM lineitem,partsupp
WHERE l_suppkey=ps_suppkey
GROUP BY 1;```

This query uses the join plan [lineitem]' X [partsupp]'.

Consider the following request:

```SELECT b1, c1, SUM(t1.float_1), SUM(t2.float_2), SUM(t3.float_3)
FROM t1, t2, t3
WHERE b1=b2
AND   b2=b3
GROUP BY 1,2;```

Without Partial GROUP BY, this query would use the join plan [[t2xt3]'X t1']', with a final GROUP BY operation. Because the columns cover join columns, the last GROUP BY can be skipped, and the join plan can be optimized to [t2xt3']'xt1'.

The last GROUP BY operation need not be performed for the following request:

```SELECT c_custkey, c_name,
SUM(l_extendedprice*(1-l_discount)(FLOAT))(DECIMAL(18,2))
AS revenue, c_acctbal, n_name,
FROM customer, ordertbl, lineitem, nation
WHERE c_custkey = o_custkey
AND   l_orderkey = o_orderkey
AND   o_orderdate >= '1993-10-01'
AND   o_orderdate < DATE '1993-10-01' + INTERVAL '3' MONTH
AND   l_returnflag = 'R'
AND   c_nationkey = n_nationkey
GROUP BY c_custkey, c_name, c_acctbal, c_phone, n_name, c_address,
c_comment
ORDER BY revenue DESC;```
This query might have two possible join plans depending on the system configuration. Neither plan requires the last GROUP BY operation. The two join plans are as follows:
• ((lineitem X ordertbl)' customer) X nation
• ((ordertbl X customer) X lineitem)' X nation

Suppose you submit the following request against the TPC-H performance benchmark database:

```SELECT c_custkey,c_name,SUM(l_extendedprice*(1-l_discount)(FLOAT))
(DECIMAL(18,2)) AS revenue, c_acctbal, n_name, c_address,
c_phone,c_comment
FROM customer, ordertbl, lineitem, nation
WHERE c_custkey = o_custkey
AND   l_orderkey = o_orderkey
AND   o_orderdate >= '1993-10-01'
AND   o_orderdate < DATE '1993-10-01' + INTERVAL '3' MONTH
AND   l_returnflag = 'R'
AND   c_nationkey = n_nationkey
GROUP BY c_custkey, c_name, c_acctbal, c_phone, n_name, c_address,
c_comment
ORDER BY revenue DESC;```

When the definition of the nation table is changed to define the primary index on n_key rather than n_name, the join plan without Partial GROUP BY would include the last GROUP BY as follows: (((lineitem X ordertbl)' X customer) X nation)'. Note the 'GROUP BY operator at the end of the join plan specification.

The Partial GROUP BY optimization compensates for this change in the primary index definition by removing the last GROUP BY, resulting in the plan ((lineitem X ordertbl)' X customer) X nation.

## Collecting Statistics to Enable Partial GROUP BY Optimization

For the Optimizer to know when to apply GROUP BY operators effectively, you should collect statistics on all join and GROUP BY columns in your typical requests.

For the following query, for example, you should collect statistics on l_orderkey, o_orderkey, o_orderdate, c_custkey, c_nationkey, and n_nationkey:

```SELECT c_custkey, c_name,
SUM(l_extendedprice*(1-l_discount)(FLOAT))(DECIMAL(18,2))
AS revenue, c_acctbal, n_name,
FROM customer, ordertbl, lineitem, nation
WHERE c_custkey=o_custkey
AND   l_orderkey=o_orderkey
AND   o_orderdate>='1993-10-01'
AND   o_orderdate<DATE '1993-10-01' + INTERVAL '3' MONTH
AND   l_returnflag='R'
AND   c_nationkey=n_nationkey
GROUP BY c_custkey, c_name, c_acctbal, c_phone, n_name, c_address,
c_comment
ORDER BY revenue DESC;```

Using another example request, you should collect statistics on l_partkey and ps_partkey for the following query:

```SELECT l_partkey, SUM(l_quantity), SUM(ps_supplycost)
FROM partsupp, lineitem
WHERE l_partkey=ps_partkey
GROUP BY 1;```