Enabling Input/Output Column Projection - Aster Execution Engine

Teradata Aster® Developer Guide

Product
Aster Execution Engine
Release Number
7.00.02
Published
July 2017
Language
English (United States)
Last Update
2018-04-13
dita:mapPath
xnl1494366523182.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
ffu1489104705746
lifecycle
previous
Product Category
Software

With SQL-MapReduce Collaborative Planning, you can add logic to your SQL-MapReduce function that enables column projection:

  • Output column projection—The Planner provides the list of output columns needed by the query. The function can use this information to produce only the columns that the query needs, which reduces the amount of data processing performed by the function and allows it to avoid too-wide data streams.
  • Input column projection—The function can determine which input columns need to be processed, depending on the output column projection. When the function provides this information to the Planner, the Planner ensures that only the needed columns are input to the function. This also reduces the amount of data passed to the function, especially if the size of the needed columns is significantly smaller than the size of columns being projected out.

For example, consider the following query, which retrieves the sessionized data for all users. In this query, the clickstream table is defined as containing the columns userid, ts, productname, pagetype, referrer, productprice, and the corresponding column names in the output produced by the function are custid, tstamp, prodname, pgtype, ref, price.

SELECT custid, tstamp, sessionid
FROM Sessionize(ON clickstream
  PARTITION BY userid
  ORDER BY ts
  TIMECOLUMN ('ts')
  TIMEOUT (60));

This query is only interested in the columns: custid, tstamp, and sessionid. However, the input to the Sessionize function includes all of the input columns, which the Sessionize will have to pass them through, even though only three columns are needed.

To prevent unnecessary processing of data, the Planner tells the function that it is only interested in three columns.The function responds by telling the Planner whether it will provide the only required columns. This handshake is necessary so that both parties agree on the schema.

In addition, based on the output column projection proposed by the Planner, the function can tell it that it only needs two input columns (userid, and ts) and the rest of the input columns are not needed. The column userid maps to custid and ts to tstamp in the output. The column sessionid is generated by the Sessionize function. In response, the Planner changes the input plan such that the function receives only the needed columns.

Without SQL-MapReduce Collaborative Planning:

  1. Execute the following query and store the output data in the temporary table tmp1:
    SESSIONIZE (SELECT * FROM clickstream ORDER BY userid, ts)

    All of the columns in the input table are passed to the function (userid, ts, productname, pagetype, referrer, productprice, ...). The function passes the columns through.

  2. Project the following output columns from the list of columns generated by the function (custid, tstamp, prodname, pgtype, ref, price, ...):
  3. Transfer the projected data to the Queen.
  4. Return the output data to the application from the Queen.

However, with SQL-MapReduce Collaborative Planning, the function receives only two columns and the projection step (see step 2 above) is eliminated, resulting in a significant optimization, as shown in the following figure.

Column projection example