SQL-MapReduce Example 2: Sessionization - Aster Execution Engine

Teradata Aster® Developer Guide

Product
Aster Execution Engine
Release Number
7.00.02
Published
July 2017
Language
English (United States)
Last Update
2018-04-13
dita:mapPath
xnl1494366523182.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
ffu1489104705746
lifecycle
previous
Product Category
Software

Sessionization is the process of mapping each click in a clickstream to a unique session identifier. We define a session as a sequence of clicks by a particular user where no more than n seconds pass between successive clicks (that is, if we don't see a click from a user for n seconds, we start a new session). Sessionization can be easily done with the Sessionize SQL-MapReduce function. Sample code is included with the Aster SQL-MapReduce Java API.

We can sessionize a table called clickstream, which consists of a userid and timestamp attribute, like this:

SELECT xtimestamp, userid, sessionid
FROM   Sessionize   (
       ON           clickstream
       PARTITION BY userid
       ORDER BY     xtimestamp
       TIMECOLUMN   ('xtimestamp')
       TIMEOUT      (60)
    );

The first parameter to the Sessionize SQL-MapReduce function is the name of the timestamp attribute, while the second is n, the number of seconds between clicks that results in the starting of a new session. The clickstream table is partitioned by userid, and within each partition, rows are sequenced by timestamp. The sessionize SQL-MapReduce function is then invoked against each of these ordered partitions, emitting the input rows with the appropriate sessionid added.