Example: Aggregating Local Results - Teradata Vantage NewSQL Engine - 16.20

Teradata Vantageā„¢ DATASET Data Type

prodname
Teradata Database
Teradata Vantage NewSQL Engine
vrm_release
16.20
category
Programming Reference
featnum
B035-1198-162K

To do a final union from all AMPs, aggregate the local results from each AMP. To do so, nest a second call to DATASET_PUBLISH. The inner call to DATASET_PUBLISH performs the local aggregation on each AMP, and the outer call does a final aggregation of the local aggregations and produce a single result row that represents all input values.

Use the PARTITION BY clause with a constant value (for example, 1) to perform the final aggregation on a single AMP. By specifying a constant value, like the number 1, the locally aggregated rows from each AMP are in the same partition and redistributed to the same AMP for the final aggregation. The outer query partitions by this constant, shown as "1 as p" in the example. A single output row is returned. You must specify the UNIQUE_NAMES custom clause so that there are no conflicts when combining schemas.

Additionally, reference the aggregated result using dot notation where the recursive descent operator references the inner DATASET_PUBLISH query result, followed by an array reference composed of the * wildcard. This retrieves one array composed of one record per input row. The final query looks similar to:

select data.getSchema(), data..record[*] FROM DATASET_PUBLISH 
( 
	ON (SELECT data as record, 1 as p FROM DATASET_PUBLISH
	(
		ON (SELECT * FROM employeeTable)
		USING UNIQUE_NAMES('Y')
	)as L
) partition by p
)G;
data.getSchema()
data..record[*]
{
	"type": "array",
	"items": {
		"type": "record",
		"name": "rec_0",
		"fields": [{
			"name": "record",
			"type": {
				"type": "record",
				"name": "rec_0_1448384735",
				"fields": [{
					"name": "empID",
					"type": "int"
				},
				{
					"name": "empName",
					"type": "string"
				},
				{
					"name": "empDept",
					"type": "string"
				}]
			}
		},
		{
			"name": "p",
			"type": "int"
		}]
	}
}
[{
	"empID": 5,
	"empName": "Kyle Newman",
	"empDept": "Engineering"
},
{
	"empID": 3,
	"empName": "Steven Mazzo",
	"empDept": "Engineering"
},
{
	"empID": 1,
	"empName": "George Smith",
	"empDept": "Accounting"
},
{
	"empID": 2,
	"empName": "Pauline Kramer",
	"empDept": "HR"
},
{
	"empID": 6,
	"empName": "Pamela Giles",
	"empDept": "Sales"
},
{
	"empID": 4,
	"empName": "Jose Hernandez",
	"empDept": "Engineering"
}]

You can aggregate all values with a single call to DATASET_PUBLISH by partitioning by a constant value. That re-distributes all rows to a single AMP to perform the aggregation, which does not take advantage of the parallel processing capability in the Teradata Database. By using two calls to DATASET_PUBLISH, the AMPs perform local aggregations in parallel and only the final aggregation is performed on a single AMP.