Sessionization is the process of mapping each click in a clickstream to a unique session identifier. We define a session as a sequence of clicks by a particular user where no more than n seconds pass between successive clicks (that is, if we don't see a click from a user for n seconds, we start a new session). Sessionization can be easily done with the Sessionize SQL-MapReduce function. Sample code is included with the Aster SQL-MapReduce Java API.
We can sessionize a table called clickstream, which consists of a userid and timestamp attribute, like this:
SELECT xtimestamp, userid, sessionid FROM Sessionize ( ON clickstream PARTITION BY userid ORDER BY xtimestamp TIMECOLUMN ('xtimestamp') TIMEOUT (60) );
The first parameter to the Sessionize SQL-MapReduce function is the name of the timestamp attribute, while the second is n, the number of seconds between clicks that results in the starting of a new session. The clickstream table is partitioned by userid, and within each partition, rows are sequenced by timestamp. The sessionize SQL-MapReduce function is then invoked against each of these ordered partitions, emitting the input rows with the appropriate sessionid added.