UDAF - Aster Execution Engine

Teradata Aster® Developer Guide

Product
Aster Execution Engine
Release Number
7.00.02
Published
July 2017
Language
English (United States)
Last Update
2018-04-13
dita:mapPath
xnl1494366523182.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
ffu1489104705746
lifecycle
previous
Product Category
Software

A UDAF is much more complex. Before going further, it is important to define these terms:

  • decomposable
  • non-decomposable

A UDAF function is decomposable if it can be divided into smaller operations, at least some of which can be run independently (therefore in parallel) in at least some situations.

Calculating a maximum value is a good example of a simple decomposable algorithm. If you break your data into multiple subsets (for example, by distributing a fact table across vworkers), each subset can be analyzed, and a maximum from that subset can be chosen.

The analysis can be done in parallel; each vworker can look for its maximum in parallel with all of the other vworkers that are looking for their maximums. The maximums from each of those subsets can then be put into a new set, and the maximum from that new set is chosen, which gives you final maximum value. You can picture this as a tree, in which each sub-tree returns the maximum from that sub-tree.

A non-decomposable function cannot be executed by breaking it into pieces that can be run in parallel.

Finding the median value of a set is a good example of a non-decomposable function. The median value is the value that would be in the center of the list if the list were sorted. The function for finding the median is not decomposable. If you split your set into 3 (or N) subsets, and you pick the median from each of those subsets and put that median into a new set, the median of that new set is not necessarily the median of all of the values in the original set.

The only way to find the median is to bring all the values together in one place, sort them, and then choose the median.

The Aster Server, which uses multiple processors and multiple nodes, generally has high performance when processing decomposable algorithms.

For a non-decomposable function, the approach is generally to send all of the relevant data of the same group through data redistribution to a single node, and then do the processing on that node. A full degree of parallelism is not attained if a certain few groups contain much more data than the others. Particularly, when there is not GROUP BY, there is only one group. In such a case there is no parallelism, and the performance is comparable to that of a single-node system.

Decomposable tasks can be performed in a non-decomposable fashion. For example, you could implement the MAX() function, which is decomposable, by using a non-decomposable algorithm (that is, send all of the data to a single node and then have that node find the max).

Teradata Aster's implementation of UDAFs allows you to write your UDAF in either of the following ways:

  • As a non-decomposable aggregate function (implementing reset, aggregateRow, and getFinalValue).
  • As a decomposable aggregate function (implementing the same three basic interface functions above, in addition to getPartialRow and aggregatePartialRow), in which case the server chooses what approach to use based on other factors in the query.

Suppose that you already have a function or library, and you want to re-write it as a Teradata Aster UDAF, and you are trying to figure out whether to write it as a non-decomposable function or as a decomposable function.

In general, if you choose a decomposable algorithm, you get a higher performance. However, sometimes it may be hard to figure out whether the code can be written in a decomposable way.

If your operation is associative (by associative, it refers to the mathematical or logical sense that the result is the same if the operations are done in the same order but with different grouping—for example, (a + b) + c = a + (b + c), then it is likely to be decomposable.