Churn Analysis Using Sessionize and NPath with Teradata Package for Python - Churn Analysis Using Sessionize and NPath with Teradata Package for Python - Teradata Package for Python

Teradata® Package for Python User Guide

Product
Teradata Package for Python
Release Number
17.00
Published
November 2021
Language
English (United States)
Last Update
2022-01-14
dita:mapPath
bol1585763678431.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage
This example uses the retail dataset of product purchase and returns to perform churn analysis.
  1. Import the required modules.
    from teradataml.analytics.sqle.Sessionize import Sessionize
    from teradataml.analytics.sqle.NPath import NPath
    from teradataml.dataframe.dataframe import DataFrame, in_schema
    from teradataml.data.load_example_data import load_example_data
  2. Load the retail_churn data.
    load_example_data("sessionize", "retail_churn_table")
  3. Create the DataFrame on the loaded dataset 'retail_churn_table'.
    df1 = DataFrame(in_schema('alice', 'retail_churn_table'))  
    # in_schema(databaseName, tableName) - Assuming table has been created in 'alice' database.
    print("***** Let's take a peek at the data *****")
    print(df1)
    print("\n***** Retails Churn Table Column Types *****")
    print(df1.dtypes)
    print("\n***** Row Count and column count for the retail data *****")
    print(df1.shape)
  4. Sessionize.
    The Sessionize function maps each click in a session to a unique session identifier. A session is defined as a sequence of clicks by one user that are separated by at most n seconds.
    1. Example 1: Call the Sessionize function – NO CHURN CUSTOMER.
      The non-churn group (denoted by the churn flag value of 'N'). Filter rows where churn_flag = 'N'.
      input_retail_data = df1[df1.churn_flag == 'N']
      print("***** No churn data *****")
      print(input_retail_data)
      
      # Sessionize the non-churn data
      daily_session_nochurn = Sessionize(data = input_retail_data, 
                                  time_out = 86400.0, 
                                  data_order_column = 'datestamp', 
                                  data_partition_column = 'customer_id', 
                                  time_column = 'datestamp')
      # Data checking
      # SELECT * FROM xxxx ORDER BY 2 ASC WHERE customer_id=1531;
      # Here xxxx is the table/view created from daily_session_nochurn.result.
      res = daily_session_nochurn.result
      res = res[res.customer_id == 1531].sort(['datestamp'],ascending = True)
      print(res)
    2. Example 2: Call the Sessionize function – CHURN CUSTOMER.
      The churn group (denoted by the churn flag value of 'Y'). Filter rows where churn_flag = 'Y'.
      input_retail_data2 = df1[df1.churn_flag == 'Y']
      print("***** Churn data *****")
      print(input_retail_data2)
      
      # Sessionize the churn data
      daily_session_churn = Sessionize(data = input_retail_data2, 
                                       time_out = 86400.0, 
                                       data_order_column = 'datestamp',
                                       data_partition_column = 'customer_id', 
                                       time_column = 'datestamp')
      
      # Data checking
      # SELECT * FROM  xxxx ORDER BY 2 ASC WHERE customer_id=64497;
      # Here xxxx is the table/view created from daily_session_churn.result.
      res2 = daily_session_churn.result
      res2 = res2[res2.customer_id == 64497].sort(['datestamp'], ascending=True)
      res2
      print(res2)
  5. Use NPath function on the outcome of Sessionize to perform Pathing.
    Pathing is the process of discovering a sequence of antecedent actions that occur prior to a specific event of interest. For example, a buyer could browse a site, call a company, read online reviews, test out the product, and then purchase.
    Pathing discovers the most salient patterns across a group of individuals or entities based on which further actions are considered.
    1. Call the NPath function – NO CHURN CUSTOMERS.
      npath_nochurn = NPath(data1 = daily_session_nochurn.result,
                mode = "NONOVERLAPPING",
                pattern ='E*.C',
                symbols = ["EVENT = 'Purchase'  AS C", "EVENT <> 'Purchase' AS E"],
                result = ["FIRST(customer_id OF ANY(E,C)) AS customer_id",
                          "FIRST(datestamp OF ANY(E,C)) AS DS_START",
                          "LAST(datestamp OF ANY(E,C)) AS DS_END",
                          "COUNT(* OF E) AS EVENT_CNT",
                          "ACCUMULATE(EVENT OF ANY(E,C)) AS PATH"],
                data1_partition_column = ['customer_id', 'SESSIONID'],
                data1_order_column = 'datestamp')
      
      # Shape of the resultant dataframe
      npath_nochurn.result.shape
      
      # Print result
      npath_nochurn.result.to_pandas().head(10)
      
      # Path for a particular customer, 1895
      res3 = npath_nochurn.result
      res3 = res3[res3.customer_id == 1895]
      res3.to_pandas()
    2. Call the NPath function – CHURN CUSTOMERS.
      npath_churn = NPath(data1 = daily_session_churn.result,
                mode = "NONOVERLAPPING",
                pattern ='E*.C',
                symbols = ["EVENT = 'Product Return'  AS C",
                           "EVENT <> 'Product Return' AS E"],
                result = ["FIRST(customer_id OF ANY(E,C)) AS customer_id",
                          "FIRST(datestamp OF ANY(E,C)) AS DS_START",
                          "LAST(datestamp OF ANY(E,C)) AS DS_END",
                          "COUNT(* OF E) AS EVENT_CNT",
                          "ACCUMULATE(EVENT OF ANY(E,C)) AS PATH"],
                 data1_partition_column = ['customer_id', 'SESSIONID'],
                 data1_order_column = 'datestamp')
      
      # Shape of the resultant dataframe
      npath_churn.result.shape
      
      # Print result
      npath_churn.result.to_pandas().head(10)
      
      # Path for a particular customer, 64115
      res4 = npath_churn.result
      res4 = res4[res4.customer_id == 64115]
      res4.to_pandas()