nPath Function ClickStream Data Examples | Teradata Vantage - nPath ClickStream Data Examples - Teradata® Database

Database Analytic Functions

Product
Teradata® Database
Release Number
17.10
Published
July 2021
Language
English (United States)
Last Update
2021-07-28
dita:mapPath
Teradata_Vantage™___Advanced_SQL_Engine_Analytic_Functions.withLogo_upload_July2021/wnd1589838592459.ditamap
dita:ditavalPath
Teradata_Vantage™___Advanced_SQL_Engine_Analytic_Functions.withLogo_upload_July2021/ayr1485454803741.ditaval
dita:id
B035-1206
lifecycle
previous
Product Category
Teradata Vantage™

Input

This statement creates the input table of clickstream data that the examples use:
CREATE MULTISET TABLE clicks1 (
  userid INTEGER,
  sessionid INTEGER,
  pageid INTEGER,
  category INTEGER,
  ts TIMESTAMP FORMAT 'YYYY-MM-DDbHH:MI:SS',
  referrer VARCHAR (256),
  val FLOAT
) PRIMARY INDEX ( userid );
This statement gets the pageid for each row and the pageid for the next row in sequence:
SELECT dt.sessionid, dt.pageid, dt.next_pageid FROM nPath (
  ON clicks1 PARTITION BY sessionid ORDER BY ts
  USING
  Mode (OVERLAPPING)
  Pattern ('A.B')
  Symbols (TRUE AS A, TRUE AS B)
  Result (
    FIRST(sessionid OF A) AS sessionid,
    FIRST (pageid OF A) AS pageid,
    FIRST (pageid OF B) AS next_pageid
  )
) AS dt;

Example: Counting Preceding Rows in a Sequence

For each row, this invocation counts the number of preceding rows in a given sequence (including the current row). The ORDER BY clause specifies DESC because the pattern must be matched over the rows preceding the start row, while the semantics dictate that the pattern be matched over the rows following the start row.
SELECT dt.sessionid, dt.pageid, dt.countrank FROM nPath (
  ON clicks1 PARTITION BY sessionid ORDER BY ts DESC
  USING
  Mode (OVERLAPPING)
  Pattern ('A*')
  Symbols (TRUE AS A)
  Result (
    FIRST (sessionid OF A) AS sessionid,
    FIRST (pageid OF A) AS pageid,
    COUNT (* OF A) AS countrank
  )
) AS dt;

Example: Complex Path Query

This query finds the user click-paths that start at pageid 50 and proceed either to pageid 80 or to pages in category 9 or category 10, finds the pageid of the last page in the path, counts the visits to page 80, and returns the maximum count for each last page, by which it sorts the output. The query ignores paths of fewer than five pages and pages for which category is less than zero.
SELECT dt.last_pageid, MAX(dt.count_page80) FROM nPath (
  ON (SELECT * FROM clicks1 WHERE category >= 0)
    PARTITION BY sessionid ORDER BY ts
  USING
  Pattern ('A.(B|C)*')
  Mode (OVERLAPPING)
  Symbols (
    pageid = 50 AS A,
    pageid = 80 AS B,
    pageid <> 80 AND category IN (9,10) AS C
  )
  Result (
    LAST(pageid OF ANY (A,B,C)) AS last_pageid,
    COUNT (* OF B) AS count_page80,
    COUNT (* OF ANY (A,B,C)) AS count_any
  )
) AS dt WHERE dt.count_any >= 5
GROUP BY dt.last_pageid
ORDER BY MAX(dt.count_page80);