An e-commerce store wants to count the advertising impressions that lead to a user clicking an online advertisement. The example counts the online advertisements that the user viewed and the television advertisements that the user might have viewed.
Input
impressionsuserid |
ts |
imp |
1 |
2012-01-01 |
ad1 |
1 |
2012-01-02 |
ad1 |
1 |
2012-01-03 |
ad1 |
1 |
2012-01-04 |
ad1 |
1 |
2012-01-05 |
ad1 |
1 |
2012-01-06 |
ad1 |
1 |
2012-01-07 |
ad1 |
2 |
2012-01-08 |
ad2 |
2 |
2012-01-09 |
ad2 |
2 |
2012-01-10 |
ad2 |
2 |
2012-01-11 |
ad2 |
... |
... |
... |
clicks2userid |
ts |
click |
1 |
2012-01-01 |
ad1 |
2 |
2012-01-08 |
ad2 |
3 |
2012-01-16 |
ad3 |
4 |
2012-01-23 |
ad4 |
5 |
2012-02-01 |
ad5 |
6 |
2012-02-08 |
ad6 |
7 |
2012-02-14 |
ad7 |
8 |
2012-02-24 |
ad8 |
9 |
2012-03-02 |
ad9 |
10 |
2012-03-10 |
ad10 |
11 |
2012-03-18 |
ad11 |
12 |
2012-03-25 |
ad12 |
13 |
2012-03-30 |
ad13 |
14 |
2012-04-02 |
ad14 |
15 |
2012-04-06 |
ad15 |
tv_spotsts |
tv_imp |
2012-01-01 |
ad2 |
2012-01-02 |
ad2 |
2012-01-03 |
ad3 |
2012-01-04 |
ad4 |
2012-01-05 |
ad5 |
2012-01-06 |
ad6 |
2012-01-07 |
ad7 |
2012-01-08 |
ad8 |
2012-01-09 |
ad9 |
2012-01-10 |
ad10 |
2012-01-11 |
ad11 |
2012-01-12 |
ad12 |
2012-01-13 |
ad13 |
2012-01-14 |
ad14 |
2012-01-15 |
ad15 |
SQL-MapReduce Call
The tables impressions and clicks have a user_id column, but the table tv_spots is only a record of television advertisements shown, which any user might have seen. Therefore, tv_spots must be a dimension table.
SELECT * FROM nPath (
ON impressions PARTITION BY userid ORDER BY ts
ON clicks2 PARTITION BY userid ORDER BY ts
ON tv_spots DIMENSION ORDER BY ts
USING
Mode (NONOVERLAPPING)
Symbols (TRUE AS imp, TRUE AS click, TRUE AS tv_imp)
Pattern ('(imp|tv_imp)*.click')
Result (
COUNT(* of imp) AS imp_cnt,
COUNT (* of tv_imp) AS tv_imp_cnt
)
) AS dt ORDER BY dt.imp_cnt;
Output
dt.imp_cnt |
tv_imp_cnt |
18 |
0 |
19 |
0 |
19 |
0 |
20 |
0 |
21 |
0 |
22 |
0 |
22 |
0 |
22 |
0 |
22 |
0 |
22 |
0 |
23 |
0 |
23 |
0 |
23 |
0 |
24 |
0 |
25 |
0 |