Using clickstream data from an online store, this example finds the sessions where the user visited the checkout page within 10 minutes of visiting the home page. Because there is no way to know in advance how many rows might appear between the home page and the checkout page, the example cannot use a LAG or LEAD expression. Therefore, it uses the Filter syntax element.
Input
clickstreamuserid |
sessionid |
clicktime |
pagetype |
1 |
1 |
10-10-2012 10:15 |
home |
1 |
1 |
10-10-2012 10:16 |
view |
1 |
1 |
10-10-2012 10:17 |
view |
1 |
1 |
10-10-2012 10:20 |
checkout |
1 |
1 |
10-10-2012 10:30 |
checkout |
1 |
1 |
10-10-2012 10:35 |
view |
1 |
1 |
10-10-2012 10:45 |
view |
2 |
2 |
10-10-2012 13:15 |
home |
2 |
2 |
10-10-2012 13:16 |
view |
2 |
2 |
10-10-2012 13:43 |
checkout |
2 |
2 |
10-10-2012 13:35 |
view |
2 |
2 |
10-10-2012 13:45 |
view |
SQL Call
SELECT * FROM Npath (
ON clickstream PARTITION BY userid ORDER BY clicktime
USING
Symbols (
pagetype='home' AS home,
pagetype <> 'home' AND pagetype <> 'checkout' AS clickview,
pagetype='checkout' AS checkout
)
Pattern ('home.clickview*.checkout')
Result (
FIRST(userid of ANY(home, checkout, clickview)) AS userid,
FIRST (sessionid of ANY(home, checkout, clickview)) AS sessioinid,
COUNT (* of any(home, checkout, clickview)) AS cnt,
FIRST (clicktime of ANY(home)) AS firsthome,
LAST (clicktime of ANY(checkout)) AS lastcheckout
)
Filter (
FIRST (clicktime + interval '10' minute OF ANY (home)) >
FIRST (clicktime of any(checkout))
)
Mode (NONOVERLAPPING)
);
Output
userid |
sessionid |
cnt |
firsthome |
lastcheckout |
1 |
1 |
4 |
2012-10-10 10:15:00 |
2012-10-10 10:20:00 |