Example: Aggregating Local Results - Advanced SQL Engine - Teradata Database

DATASET Data Type

Product
Advanced SQL Engine
Teradata Database
Release Number
17.05
17.00
Published
June 2020
Language
English (United States)
Last Update
2021-01-23
dita:mapPath
des1556232910526.ditamap
dita:ditavalPath
lze1555437562152.ditaval
dita:id
B035-1198
lifecycle
previous
Product Category
Teradata Vantageā„¢

To do a final union from all AMPs, aggregate the local results from each AMP. To do so, nest a second call to DATASET_PUBLISH. The inner call to DATASET_PUBLISH performs the local aggregation on each AMP, and the outer call does a final aggregation of the local aggregations and produce a single result row that represents all input values.

Use the PARTITION BY clause with a constant value (for example, 1) to perform the final aggregation on a single AMP. By specifying a constant value, like the number 1, the locally aggregated rows from each AMP are in the same partition and redistributed to the same AMP for the final aggregation. The outer query partitions by this constant, shown as "1 as p" in the example. A single output row is returned. You must specify the UNIQUE_NAMES custom clause so that there are no conflicts when combining schemas.

Additionally, reference the aggregated result using dot notation where the recursive descent operator references the inner DATASET_PUBLISH query result, followed by an array reference composed of the * wildcard. This retrieves one array composed of one record per input row. The final query looks similar to:

select data.getSchema(), data..record[*] FROM DATASET_PUBLISH 
( 
	ON (SELECT data as record, 1 as p FROM DATASET_PUBLISH
	(
		ON (SELECT * FROM employeeTable)
		USING UNIQUE_NAMES('Y')
	)as L
) partition by p
)G;
data.getSchema()
data..record[*]
{
	"type": "array",
	"items": {
		"type": "record",
		"name": "rec_0",
		"fields": [{
			"name": "record",
			"type": {
				"type": "record",
				"name": "rec_0_1448384735",
				"fields": [{
					"name": "empID",
					"type": "int"
				},
				{
					"name": "empName",
					"type": "string"
				},
				{
					"name": "empDept",
					"type": "string"
				}]
			}
		},
		{
			"name": "p",
			"type": "int"
		}]
	}
}
[{
	"empID": 5,
	"empName": "Kyle Newman",
	"empDept": "Engineering"
},
{
	"empID": 3,
	"empName": "Steven Mazzo",
	"empDept": "Engineering"
},
{
	"empID": 1,
	"empName": "George Smith",
	"empDept": "Accounting"
},
{
	"empID": 2,
	"empName": "Pauline Kramer",
	"empDept": "HR"
},
{
	"empID": 6,
	"empName": "Pamela Giles",
	"empDept": "Sales"
},
{
	"empID": 4,
	"empName": "Jose Hernandez",
	"empDept": "Engineering"
}]

You can aggregate all values with a single call to DATASET_PUBLISH by partitioning by a constant value. That re-distributes all rows to a single AMP to perform the aggregation, which does not take advantage of the parallel processing capability in the Teradata Database. By using two calls to DATASET_PUBLISH, the AMPs perform local aggregations in parallel and only the final aggregation is performed on a single AMP.