The following table describes the flow of a window aggregate UDF for a group.
Phase | Value of phase.getPhase() | Description |
---|---|---|
1 | Phase.AGR_INIT | This phase is triggered once per partition at the start of a new aggregation group or first row. The method must:
|
2 | Phase.AGR_DETAIL | This phase is triggered every time the forward row progresses. Vantage calls the method once for each row to be aggregated for each group. The method must combine the UDF argument input data with the intermediate storage defined for the group. |
3 | Phase.AGR_MOVINGTRAIL | This phase is applicable for the moving window type (noncumulative, nonreporting window type). This phase is triggered only by the last few rows of a moving window when the forward pointer to the window reaches the end of the group or end of the file. The phase does not provide any row or value to the method, but it is mainly used to indicate to the method that we are reaching the end of the group or file. The method can use this phase to adjust the necessary internal count or related values to reflect the actual size as the window diminishes towards the end of the group or file. |
4 | Phase.AGR_FINAL | This phase is invoked at the time the final evaluated result needs to be moved into the result row. No more input is expected for the group, and the method produces the final result for the group. |
5 | Phase.AGR_NODATA | This phase is only presented when there is absolutely no data to aggregate. |
The Phase.AGR_COMBINE phase is not applicable for window aggregate UDFs.
- pre_window size: This value is specified as part of the PRECEDING clause. The value of this field is negative if this points to a row that precedes the current row. This field is not applicable for the cumulative and reporting window types. It is initialized to zero in those cases.
- post_window size: This value is specified as part of the FOLLOWING clause. The value of this field is negative if this points to a row that follows the current row. This field is not applicable for the cumulative and reporting window types. It is initialized to zero in those cases.
- window_size: For the cumulative window type, this value is -1. For the reporting window type, the value is -2. For the regular window type, the value is set as post_window size -pre_window size +1 (+1 for the current row).
You must maintain a cache of rows corresponding to the window size. It may be useful to maintain a counter value that indicates the total rows read so far. For example, if C represents the counter, then the window of rows for evaluation would be:
if (C < window_size), window size would be C else window size would be as indicated in function context.
The base of the window would be C - window_size and the end of the window would correspond to C. The complexities of the various window combinations (PRECEDING/FOLLOWING/CURRENT, etc) is handled by Vantage. You only need to maintain the window cache of rows and implement the semantics of the function on this cache of rows.
To see sample Java code that implements a window aggregate function, see Java Window Aggregate Function.