Example: Aggregating Local Results - Analytics Database - Teradata Vantage

DATASET Data Type

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Teradata Vantage
Release Number
17.20
Published
June 2022
ft:locale
en-US
ft:lastEdition
2023-10-30
dita:mapPath
bka1628112240653.ditamap
dita:ditavalPath
qkf1628213546010.ditaval
dita:id
ekk1458586304878
lifecycle
latest
Product Category
Teradata Vantageā„¢

To do a final union from all AMPs, aggregate the local results from each AMP. To do so, nest a second call to DATASET_PUBLISH. The inner call to DATASET_PUBLISH performs the local aggregation on each AMP, and the outer call does a final aggregation of the local aggregations and produce a single result row that represents all input values.

Use the PARTITION BY clause with a constant value (for example, 1) to perform the final aggregation on a single AMP. By specifying a constant value, like the number 1, the locally aggregated rows from each AMP are in the same partition and redistributed to the same AMP for the final aggregation. The outer query partitions by this constant, shown as "1 as p" in the example. A single output row is returned. You must specify the UNIQUE_NAMES custom clause so that there are no conflicts when combining schemas.

Additionally, reference the aggregated result using dot notation where the recursive descent operator references the inner DATASET_PUBLISH query result, followed by an array reference composed of the * wildcard. This retrieves one array composed of one record per input row. The final query looks similar to:

select data.getSchema(), data..record[*] FROM DATASET_PUBLISH 
( 
	ON (SELECT data as record, 1 as p FROM DATASET_PUBLISH
	(
		ON (SELECT * FROM employeeTable)
		USING UNIQUE_NAMES('Y')
	)as L
) partition by p
)G;
data.getSchema()
data..record[*]
{
	"type": "array",
	"items": {
		"type": "record",
		"name": "rec_0",
		"fields": [{
			"name": "record",
			"type": {
				"type": "record",
				"name": "rec_0_1448384735",
				"fields": [{
					"name": "empID",
					"type": "int"
				},
				{
					"name": "empName",
					"type": "string"
				},
				{
					"name": "empDept",
					"type": "string"
				}]
			}
		},
		{
			"name": "p",
			"type": "int"
		}]
	}
}
[{
	"empID": 5,
	"empName": "Kyle Newman",
	"empDept": "Engineering"
},
{
	"empID": 3,
	"empName": "Steven Mazzo",
	"empDept": "Engineering"
},
{
	"empID": 1,
	"empName": "George Smith",
	"empDept": "Accounting"
},
{
	"empID": 2,
	"empName": "Pauline Kramer",
	"empDept": "HR"
},
{
	"empID": 6,
	"empName": "Pamela Giles",
	"empDept": "Sales"
},
{
	"empID": 4,
	"empName": "Jose Hernandez",
	"empDept": "Engineering"
}]

You can aggregate all values with a single call to DATASET_PUBLISH by partitioning by a constant value. That re-distributes all rows to a single AMP to perform the aggregation, which does not take advantage of the parallel processing capability in the database. By using two calls to DATASET_PUBLISH, the AMPs perform local aggregations in parallel and only the final aggregation is performed on a single AMP.