This example uses the retail dataset of product purchase and returns to perform churn analysis.
- Import the required modules.
from teradataml.analytics.sqle.Sessionize import Sessionize from teradataml.analytics.sqle.NPath import NPath from teradataml.dataframe.dataframe import DataFrame, in_schema from teradataml.data.load_example_data import load_example_data
- Load the retail churn data.
load_example_data("sessionize", "retail_churn_table")
- Create the DataFrame on the loaded dataset 'retail_churn_table'.
df1 = DataFrame(in_schema('alice', 'retail_churn_table')) # in_schema(databaseName, tableName) - Assuming table has been created in 'alice' database. print("***** Let's take a peek at the data *****") print(df1) print("\n***** Retails Churn Table Column Types *****") print(df1.dtypes) print("\n***** Row Count and column count for the retail data *****") print(df1.shape)
- Sessionize.The Sessionize function maps each click in a session to a unique session identifier. A session is defined as a sequence of clicks by one user that are separated by at most n seconds.
- Example 1: Call the Sessionize function – NO CHURN CUSTOMER.The non-churn group (denoted by the churn flag value of 'N'). Filter rows where churn_flag = 'N'.
input_retail_data = df1[df1.churn_flag == 'N'] print("***** No churn data *****") print(input_retail_data)
# Sessionize the non-churn data daily_session_nochurn = Sessionize(data = input_retail_data, time_out = 86400.0, data_order_column = 'datestamp', data_partition_column = 'customer_id', time_column = 'datestamp')
# Data checking # SELECT * FROM xxxx ORDER BY 2 ASC WHERE customer_id=1531; # Here xxxx is the table/view created from daily_session_nochurn.result. res = daily_session_nochurn.result res = res[res.customer_id == 1531].sort(['datestamp'],ascending = True) print(res)
- Example 2: Call the Sessionize function – CHURN CUSTOMER.The churn group (denoted by the churn flag value of 'Y'). Filter rows where churn_flag = 'Y'.
input_retail_data2 = df1[df1.churn_flag == 'Y'] print("***** Churn data *****") print(input_retail_data2)
# Sessionize the churn data daily_session_churn = Sessionize(data = input_retail_data2, time_out = 86400.0, data_order_column = 'datestamp', data_partition_column = 'customer_id', time_column = 'datestamp')
# Data checking # SELECT * FROM xxxx ORDER BY 2 ASC WHERE customer_id=64497; # Here xxxx is the table/view created from daily_session_churn.result. res2 = daily_session_churn.result res2 = res2[res2.customer_id == 64497].sort(['datestamp'], ascending=True) res2 print(res2)
- Example 1: Call the Sessionize function – NO CHURN CUSTOMER.
- Use NPath function on the outcome of Sessionize to perform Pathing.Pathing is the process of discovering a sequence of antecedent actions that occur prior to a specific event of interest. For example, a buyer could browse a site, call a company, read online reviews, test out the product, and then purchase.Pathing discovers the most salient patterns across a group of individuals or entities based on which further actions are considered.
- Call the NPath function – NO CHURN CUSTOMERS.
npath_nochurn = NPath(data1 = daily_session_nochurn.result, mode = "NONOVERLAPPING", pattern ='E*.C', symbols = ["EVENT = 'Purchase' AS C", "EVENT <> 'Purchase' AS E"], result = ["FIRST(customer_id OF ANY(E,C)) AS customer_id", "FIRST(datestamp OF ANY(E,C)) AS DS_START", "LAST(datestamp OF ANY(E,C)) AS DS_END", "COUNT(* OF E) AS EVENT_CNT", "ACCUMULATE(EVENT OF ANY(E,C)) AS PATH"], data1_partition_column = ['customer_id', 'SESSIONID'], data1_order_column = 'datestamp')
# Shape of the resultant dataframe npath_nochurn.result.shape
# Print result npath_nochurn.result.to_pandas().head(10)
# Path for a particular customer, 1895 res3 = npath_nochurn.result res3 = res3[res3.customer_id == 1895] res3.to_pandas()
- Call the NPath function – CHURN CUSTOMERS.
npath_churn = NPath(data1 = daily_session_churn.result, mode = "NONOVERLAPPING", pattern ='E*.C', symbols = ["EVENT = 'Product Return' AS C", "EVENT <> 'Product Return' AS E"], result = ["FIRST(customer_id OF ANY(E,C)) AS customer_id", "FIRST(datestamp OF ANY(E,C)) AS DS_START", "LAST(datestamp OF ANY(E,C)) AS DS_END", "COUNT(* OF E) AS EVENT_CNT", "ACCUMULATE(EVENT OF ANY(E,C)) AS PATH"], data1_partition_column = ['customer_id', 'SESSIONID'], data1_order_column = 'datestamp')
# Shape of the resultant dataframe npath_churn.result.shape
# Print result npath_churn.result.to_pandas().head(10)
# Path for a particular customer, 64115 res4 = npath_churn.result res4 = res4[res4.customer_id == 64115] res4.to_pandas()
- Call the NPath function – NO CHURN CUSTOMERS.