Using clickstream data from an online store, this example finds the sessions where the user visited the checkout page within 10 minutes of visiting the home page. Because there is no way to know in advance how many rows might appear between the home page and the checkout page, the example cannot use a LAG or LEAD expression. Therefore, it uses the Filter syntax element.
Input
userid | sessionid | clicktime | pagetype |
---|---|---|---|
1 | 1 | 10-10-2012 10:15 | home |
1 | 1 | 10-10-2012 10:16 | view |
1 | 1 | 10-10-2012 10:17 | view |
1 | 1 | 10-10-2012 10:20 | checkout |
1 | 1 | 10-10-2012 10:30 | checkout |
1 | 1 | 10-10-2012 10:35 | view |
1 | 1 | 10-10-2012 10:45 | view |
2 | 2 | 10-10-2012 13:15 | home |
2 | 2 | 10-10-2012 13:16 | view |
2 | 2 | 10-10-2012 13:43 | checkout |
2 | 2 | 10-10-2012 13:35 | view |
2 | 2 | 10-10-2012 13:45 | view |
SQL Call
SELECT * FROM nPath@coprocessor ( ON clickstream PARTITION BY userid ORDER BY clicktime USING Symbols (pagetype='home' AS home, pagetype!='home' AND pagetype!='checkout' AS view, pagetype='checkout' AS checkout) Pattern ('home.view*.checkout') Result (FIRST(userid of ANY(home, checkout, view)) AS userid, FIRST (sessionid of ANY(home, checkout, view)) AS sessionid, COUNT (* of any(home, checkout, view)) AS cnt, FIRST (clicktime of ANY(home)) AS firsthome, LAST (clicktime of ANY(checkout)) AS lastcheckout) Filter (FIRST (clicktime + '10 minutes' ::interval OF ANY (home)) > FIRST (clicktime of any(checkout))) Mode (NONOVERLAPPING) ) AS dt;
Output
userid | sessionid | cnt | firsthome | lastcheckout |
---|---|---|---|---|
1 | 1 | 4 | 2012-10-10 10:15:00 | 2012-10-10 10:20:00 |
Download a zip file of all examples and a SQL script file that creates their input tables.