Churn Analysis Using Sessionize and NPath with Teradata Package for Python - Churn Analysis Using Sessionize and NPath with Teradata Package for Python

Churn Analysis Using Sessionize and NPath with Teradata Package for Python - Churn Analysis Using Sessionize and NPath with Teradata Package for Python - Teradata Package for Python

Teradata® Package for Python User Guide

Product

Teradata Package for Python

Release Number

17.00

Published

November 2021

Language

English (United States)

Last Update

2022-01-14

dita:mapPath

bol1585763678431.ditamap

dita:ditavalPath

ayr1485454803741.ditaval

dita:id

B700-4006

lifecycle

Product Category

Teradata Vantage

This example uses the retail dataset of product purchase and returns to perform churn analysis.

Import the required modules.

from teradataml.analytics.sqle.Sessionize import Sessionize
from teradataml.analytics.sqle.NPath import NPath
from teradataml.dataframe.dataframe import DataFrame, in_schema
from teradataml.data.load_example_data import load_example_data

Load the retail_churn data.

load_example_data("sessionize", "retail_churn_table")

Create the DataFrame on the loaded dataset 'retail_churn_table'.

df1 = DataFrame(in_schema('alice', 'retail_churn_table'))  
# in_schema(databaseName, tableName) - Assuming table has been created in 'alice' database.
print("***** Let's take a peek at the data *****")
print(df1)
print("\n***** Retails Churn Table Column Types *****")
print(df1.dtypes)
print("\n***** Row Count and column count for the retail data *****")
print(df1.shape)

Sessionize.

The Sessionize function maps each click in a session to a unique session identifier. A session is defined as a sequence of clicks by one user that are separated by at most n seconds.

Example 1: Call the Sessionize function – NO CHURN CUSTOMER.

The non-churn group (denoted by the churn flag value of 'N'). Filter rows where churn_flag = 'N'.

input_retail_data = df1[df1.churn_flag == 'N']
print("***** No churn data *****")
print(input_retail_data)

# Sessionize the non-churn data
daily_session_nochurn = Sessionize(data = input_retail_data, 
                            time_out = 86400.0, 
                            data_order_column = 'datestamp', 
                            data_partition_column = 'customer_id', 
                            time_column = 'datestamp')

# Data checking
# SELECT * FROM xxxx ORDER BY 2 ASC WHERE customer_id=1531;
# Here xxxx is the table/view created from daily_session_nochurn.result.
res = daily_session_nochurn.result
res = res[res.customer_id == 1531].sort(['datestamp'],ascending = True)
print(res)

Example 2: Call the Sessionize function – CHURN CUSTOMER.

The churn group (denoted by the churn flag value of 'Y'). Filter rows where churn_flag = 'Y'.

input_retail_data2 = df1[df1.churn_flag == 'Y']
print("***** Churn data *****")
print(input_retail_data2)

# Sessionize the churn data
daily_session_churn = Sessionize(data = input_retail_data2, 
                                 time_out = 86400.0, 
                                 data_order_column = 'datestamp',
                                 data_partition_column = 'customer_id', 
                                 time_column = 'datestamp')

# Data checking
# SELECT * FROM  xxxx ORDER BY 2 ASC WHERE customer_id=64497;
# Here xxxx is the table/view created from daily_session_churn.result.
res2 = daily_session_churn.result
res2 = res2[res2.customer_id == 64497].sort(['datestamp'], ascending=True)
res2
print(res2)

Use NPath function on the outcome of Sessionize to perform Pathing.

Pathing is the process of discovering a sequence of antecedent actions that occur prior to a specific event of interest. For example, a buyer could browse a site, call a company, read online reviews, test out the product, and then purchase.

Pathing discovers the most salient patterns across a group of individuals or entities based on which further actions are considered.

Call the NPath function – NO CHURN CUSTOMERS.

npath_nochurn = NPath(data1 = daily_session_nochurn.result,
          mode = "NONOVERLAPPING",
          pattern ='E*.C',
          symbols = ["EVENT = 'Purchase'  AS C", "EVENT <> 'Purchase' AS E"],
          result = ["FIRST(customer_id OF ANY(E,C)) AS customer_id",
                    "FIRST(datestamp OF ANY(E,C)) AS DS_START",
                    "LAST(datestamp OF ANY(E,C)) AS DS_END",
                    "COUNT(* OF E) AS EVENT_CNT",
                    "ACCUMULATE(EVENT OF ANY(E,C)) AS PATH"],
          data1_partition_column = ['customer_id', 'SESSIONID'],
          data1_order_column = 'datestamp')

# Shape of the resultant dataframe
npath_nochurn.result.shape

# Print result
npath_nochurn.result.to_pandas().head(10)

# Path for a particular customer, 1895
res3 = npath_nochurn.result
res3 = res3[res3.customer_id == 1895]
res3.to_pandas()

Call the NPath function – CHURN CUSTOMERS.

npath_churn = NPath(data1 = daily_session_churn.result,
          mode = "NONOVERLAPPING",
          pattern ='E*.C',
          symbols = ["EVENT = 'Product Return'  AS C",
                     "EVENT <> 'Product Return' AS E"],
          result = ["FIRST(customer_id OF ANY(E,C)) AS customer_id",
                    "FIRST(datestamp OF ANY(E,C)) AS DS_START",
                    "LAST(datestamp OF ANY(E,C)) AS DS_END",
                    "COUNT(* OF E) AS EVENT_CNT",
                    "ACCUMULATE(EVENT OF ANY(E,C)) AS PATH"],
           data1_partition_column = ['customer_id', 'SESSIONID'],
           data1_order_column = 'datestamp')

# Shape of the resultant dataframe
npath_churn.result.shape

# Print result
npath_churn.result.to_pandas().head(10)

# Path for a particular customer, 64115
res4 = npath_churn.result
res4 = res4[res4.customer_id == 64115]
res4.to_pandas()