Churn Analysis Using Sessionize and NPath with Teradata Python Package - Teradata Python Package

Teradata® Python Package User Guide

Product
Teradata Python Package
Release Number
16.20
Published
February 2020
Language
English (United States)
Last Update
2020-02-29
dita:mapPath
rkb1531260709148.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage
This example uses the retail dataset of product purchase and returns to perform churn analysis.
  1. Import the required modules.
    from teradataml.analytics.sqle.Sessionize import Sessionize
    from teradataml.analytics.sqle.NPath import NPath
    from teradataml.dataframe.dataframe import DataFrame, in_schema
    from teradataml.data.load_example_data import load_example_data
  2. Load the retail_churn data.
    load_example_data("sessionize", "retail_churn_table")
  3. Create the DataFrame on the loaded dataset 'retail_churn_table'.
    df1 = DataFrame(in_schema('alice', 'retail_churn_table'))  
    # in_schema(databaseName, tableName) - Assuming table has been created in 'alice' database.
    print("***** Let's take a peek at the data *****")
    print(df1)
    print("\n***** Retails Churn Table Column Types *****")
    print(df1.dtypes)
    print("\n***** Row Count and column count for the retail data *****")
    print(df1.shape)
  4. Sessionize.
    The Sessionize function maps each click in a session to a unique session identifier. A session is defined as a sequence of clicks by one user that are separated by at most n seconds.
    1. Example 1: Call the Sessionize function – NO CHURN CUSTOMER.
      The non-churn group (denoted by the churn flag value of 'N'). Filter rows where churn_flag = 'N'.
      input_retail_data = df1[df1.churn_flag == 'N']
      print("***** No churn data *****")
      print(input_retail_data)
      
      # Sessionize the non-churn data
      daily_session_nochurn = Sessionize(data = input_retail_data, 
                                  time_out = 86400.0, 
                                  data_order_column = 'datestamp', 
                                  data_partition_column = 'customer_id', 
                                  time_column = 'datestamp')
      # Data checking
      # SELECT * FROM xxxx ORDER BY 2 ASC WHERE customer_id=1531;
      # Here xxxx is the table/view created from daily_session_nochurn.result.
      res = daily_session_nochurn.result
      res = res[res.customer_id == 1531].sort(['datestamp'],ascending = True)
      print(res)
    2. Example 2: Call the Sessionize function – CHURN CUSTOMER.
      The churn group (denoted by the churn flag value of 'Y'). Filter rows where churn_flag = 'Y'.
      input_retail_data2 = df1[df1.churn_flag == 'Y']
      print("***** Churn data *****")
      print(input_retail_data2)
      
      # Sessionize the churn data
      daily_session_churn = Sessionize(data = input_retail_data2, 
                                       time_out = 86400.0, 
                                       data_order_column = 'datestamp',
                                       data_partition_column = 'customer_id', 
                                       time_column = 'datestamp')
      
      # Data checking
      # SELECT * FROM  xxxx ORDER BY 2 ASC WHERE customer_id=64497;
      # Here xxxx is the table/view created from daily_session_churn.result.
      res2 = daily_session_churn.result
      res2 = res2[res2.customer_id == 64497].sort(['datestamp'], ascending=True)
      res2
      print(res)
  5. Use NPath function on the outcome of Sessionize to perform Pathing.
    Pathing is the process of discovering a sequence of antecedent actions that occur prior to a specific event of interest. For example, a buyer could browse a site, call a company, read online reviews, test out the product, and then purchase.
    Pathing discovers the most salient patterns across a group of individuals or entities based on which further actions are considered.
    1. Call the NPath function – NO CHURN CUSTOMERS.
      npath_nochurn = NPath(data1 = daily_session_nochurn.result,
                mode = "NONOVERLAPPING",
                pattern ='E*.C',
                symbols = ["EVENT = 'Purchase'  AS C", "EVENT <> 'Purchase' AS E"],
                result = ["FIRST(customer_id OF ANY(E,C)) AS customer_id",
                          "FIRST(datestamp OF ANY(E,C)) AS DS_START",
                          "LAST(datestamp OF ANY(E,C)) AS DS_END",
                          "COUNT(* OF E) AS EVENT_CNT",
                          "ACCUMULATE(EVENT OF ANY(E,C)) AS PATH"],
                data1_partition_column = ['customer_id', 'SESSIONID'],
                data1_order_column = 'datestamp')
      
      # Shape of the resultant dataframe
      npath_nochurn.result.shape
      
      # Print result
      npath_nochurn.result.to_pandas().head(10)
      
      # Path for a particular customer, 1895
      res3 = npath_nochurn.result
      res3 = res3[res3.customer_id == 1895]
      res3.to_pandas()
    2. Call the NPath function – CHURN CUSTOMERS.
      npath_churn = NPath(data1 = daily_session_churn.result,
                mode = "NONOVERLAPPING",
                pattern ='E*.C',
                symbols = ["EVENT = 'Product Return'  AS C",
                           "EVENT <> 'Product Return' AS E"],
                result = ["FIRST(customer_id OF ANY(E,C)) AS customer_id",
                          "FIRST(datestamp OF ANY(E,C)) AS DS_START",
                          "LAST(datestamp OF ANY(E,C)) AS DS_END",
                          "COUNT(* OF E) AS EVENT_CNT",
                          "ACCUMULATE(EVENT OF ANY(E,C)) AS PATH"],
                 data1_partition_column = ['customer_id', 'SESSIONID'],
                 data1_order_column = 'datestamp')
      
      # Shape of the resultant dataframe
      npath_churn.result.shape
      
      # Print result
      npath_churn.result.to_pandas().head(10)
      
      # Path for a particular customer, 64115
      res4 = npath_churn.result
      res4 = res4[res4.customer_id == 64115]
      res4.to_pandas()