Churn Analysis Using Sessionize and NPath with Teradata Package for Python - Churn Analysis Using Sessionize and NPath with Teradata Package for Python - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905
This example uses the retail dataset of product purchase and returns to perform churn analysis.
  1. Import the required modules.
    from teradataml.analytics.sqle.Sessionize import Sessionize
    from teradataml.analytics.sqle.NPath import NPath
    from teradataml.dataframe.dataframe import DataFrame, in_schema
    from teradataml.data.load_example_data import load_example_data
  2. Load the retail churn data.
    load_example_data("sessionize", "retail_churn_table")
  3. Create the DataFrame on the loaded dataset 'retail_churn_table'.
    df1 = DataFrame(in_schema('alice', 'retail_churn_table'))  
    # in_schema(databaseName, tableName) - Assuming table has been created in 'alice' database.
    print("***** Let's take a peek at the data *****")
    print(df1)
    print("\n***** Retails Churn Table Column Types *****")
    print(df1.dtypes)
    print("\n***** Row Count and column count for the retail data *****")
    print(df1.shape)
  4. Sessionize.
    The Sessionize function maps each click in a session to a unique session identifier. A session is defined as a sequence of clicks by one user that are separated by at most n seconds.
    1. Example 1: Call the Sessionize function – NO CHURN CUSTOMER.
      The non-churn group (denoted by the churn flag value of 'N'). Filter rows where churn_flag = 'N'.
      input_retail_data = df1[df1.churn_flag == 'N']
      print("***** No churn data *****")
      print(input_retail_data)
      
      # Sessionize the non-churn data
      daily_session_nochurn = Sessionize(data = input_retail_data, 
                                  time_out = 86400.0, 
                                  data_order_column = 'datestamp', 
                                  data_partition_column = 'customer_id', 
                                  time_column = 'datestamp')
      # Data checking
      # SELECT * FROM xxxx ORDER BY 2 ASC WHERE customer_id=1531;
      # Here xxxx is the table/view created from daily_session_nochurn.result.
      res = daily_session_nochurn.result
      res = res[res.customer_id == 1531].sort(['datestamp'],ascending = True)
      print(res)
    2. Example 2: Call the Sessionize function – CHURN CUSTOMER.
      The churn group (denoted by the churn flag value of 'Y'). Filter rows where churn_flag = 'Y'.
      input_retail_data2 = df1[df1.churn_flag == 'Y']
      print("***** Churn data *****")
      print(input_retail_data2)
      
      # Sessionize the churn data
      daily_session_churn = Sessionize(data = input_retail_data2, 
                                       time_out = 86400.0, 
                                       data_order_column = 'datestamp',
                                       data_partition_column = 'customer_id', 
                                       time_column = 'datestamp')
      
      # Data checking
      # SELECT * FROM  xxxx ORDER BY 2 ASC WHERE customer_id=64497;
      # Here xxxx is the table/view created from daily_session_churn.result.
      res2 = daily_session_churn.result
      res2 = res2[res2.customer_id == 64497].sort(['datestamp'], ascending=True)
      res2
      print(res2)
  5. Use NPath function on the outcome of Sessionize to perform Pathing.
    Pathing is the process of discovering a sequence of antecedent actions that occur prior to a specific event of interest. For example, a buyer could browse a site, call a company, read online reviews, test out the product, and then purchase.
    Pathing discovers the most salient patterns across a group of individuals or entities based on which further actions are considered.
    1. Call the NPath function – NO CHURN CUSTOMERS.
      npath_nochurn = NPath(data1 = daily_session_nochurn.result,
                mode = "NONOVERLAPPING",
                pattern ='E*.C',
                symbols = ["EVENT = 'Purchase'  AS C", "EVENT <> 'Purchase' AS E"],
                result = ["FIRST(customer_id OF ANY(E,C)) AS customer_id",
                          "FIRST(datestamp OF ANY(E,C)) AS DS_START",
                          "LAST(datestamp OF ANY(E,C)) AS DS_END",
                          "COUNT(* OF E) AS EVENT_CNT",
                          "ACCUMULATE(EVENT OF ANY(E,C)) AS PATH"],
                data1_partition_column = ['customer_id', 'SESSIONID'],
                data1_order_column = 'datestamp')
      
      # Shape of the resultant dataframe
      npath_nochurn.result.shape
      
      # Print result
      npath_nochurn.result.to_pandas().head(10)
      
      # Path for a particular customer, 1895
      res3 = npath_nochurn.result
      res3 = res3[res3.customer_id == 1895]
      res3.to_pandas()
    2. Call the NPath function – CHURN CUSTOMERS.
      npath_churn = NPath(data1 = daily_session_churn.result,
                mode = "NONOVERLAPPING",
                pattern ='E*.C',
                symbols = ["EVENT = 'Product Return'  AS C",
                           "EVENT <> 'Product Return' AS E"],
                result = ["FIRST(customer_id OF ANY(E,C)) AS customer_id",
                          "FIRST(datestamp OF ANY(E,C)) AS DS_START",
                          "LAST(datestamp OF ANY(E,C)) AS DS_END",
                          "COUNT(* OF E) AS EVENT_CNT",
                          "ACCUMULATE(EVENT OF ANY(E,C)) AS PATH"],
                 data1_partition_column = ['customer_id', 'SESSIONID'],
                 data1_order_column = 'datestamp')
      
      # Shape of the resultant dataframe
      npath_churn.result.shape
      
      # Print result
      npath_churn.result.to_pandas().head(10)
      
      # Path for a particular customer, 64115
      res4 = npath_churn.result
      res4 = res4[res4.customer_id == 64115]
      res4.to_pandas()