Teradata Package for Python Function Reference - Attribution - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

teradataml.analytics.mle.Attribution = class Attribution(builtins.object)

Methods defined here:

__init__(self, data=None, data_optional=None, conversion_events=None, excluding_data=None, optional_data=None, model1_type=None, model2_type=None, model1_name=None, model2_name=None, event_column=None, timestamp_column=None, window_size=None, conversion_data=None, optional_events=None, exclude_events=None, data_sequence_column=None, data_optional_sequence_column=None, conversion_data_sequence_column=None, excluding_data_sequence_column=None, optional_data_sequence_column=None, model1_type_sequence_column=None, model2_type_sequence_column=None, data_partition_column=None, data_optional_partition_column=None, data_order_column=None, data_optional_order_column=None, conversion_data_order_column=None, excluding_data_order_column=None, optional_data_order_column=None, model1_type_order_column=None, model2_type_order_column=None): DESCRIPTION: The Attribution function is used in web page analysis, where it lets companies assign weights to pages before certain events, such as buying a product. The function calculates attributions with a choice of distribution models and has two versions: • Multiple-input: Accepts one or more input tables and gets many parameters from other dimension tables. • Single-input: Accepts only one input table and gets all parameters from argments. Note: This function is available only when teradataml is connected to Vantage 1.1 or later versions. PARAMETERS: data: Required Argument. Specifies the teradataml DataFrame that contains the click stream data, which the function uses to compute attributions. data_partition_column: Required Argument. Specifies Partition By columns for data. Values to this argument can be provided as a list, if multiple columns are used for partition. Types: str OR list of Strings (str) data_order_column: Required Argument. Specifies Order By columns for data. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) data_optional: Optional Argument. Specifies the teradataml DataFrame that contains additional click stream data, which cogroup attributes from all specified teradataml DataFrame. data_optional_partition_column: Optional Argument. Required when 'data_optional' is used. Specifies Partition By columns for data_optional. Values to this argument can be provided as a list, if multiple columns are used for partition. Types: str OR list of Strings (str) data_optional_order_column: Optional Argument. Required when 'data_optional' is used. Specifies Order By columns for data_optional. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) conversion_events: Optional Argument. "conversion_events" is a required argument if "conversion_data" is not provided. Specifies the conversion event value. Each conversion_event is a string or integer. Types: str OR list of Strings (str) excluding_data: Optional Argument. Specifies the teradataml DataFrame that contains one varchar column (excluding_events) containing excluding cause event values. excluding_data_order_column: Optional Argument. Specifies Order By columns for excluding_data. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) optional_data: Optional Argument. Specifies the teradataml DataFrame that contains one varchar column (optional_events) containing optional cause event values. optional_data_order_column: Optional Argument. Specifies Order By columns for optional_data. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) model1_type: Optional Argument. "model1_type" is a required argument if "model1_name" is not provided. Specifies the teradataml DataFrame that defines the type and specification of the first model. For example: model1 data ("EVENT_REGULAR", "email:0.19:LAST_CLICK:NA", "impression:0.81:WEIGHTED:0.4,0.3,0.2,0.1") model1_type_order_column: Optional Argument. Specifies Order By columns for model1_type. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) model2_type: Optional Argument. Specifies the teradataml DataFrame that defines the type and distributions of the second model. For example: model2 data ("EVENT_OPTIONAL", "OrganicSearch:0.5:UNIFORM:NA", "Direct:0.3:UNIFORM:NA", "Referral:0.2:UNIFORM:NA") model2_type_order_column: Optional Argument. Specifies Order By columns for model2_type. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) model1_name: Optional Argument. "model1_name" is a required argument if "model1_type" is not provided. Specifies the type and specifcation of the first model. For example: Model1 ('EVENT_REGULAR', 'email:0.19:LAST_CLICK:NA', 'impression:0.81:WEIGHTED:0.4,0.3,0.2,0.1') Types: str OR list of Strings (str) model2_name: Optional Argument. Specifies the type and distributions of the second model. For example: Model2 ('EVENT_OPTIONAL', 'OrganicSearch:0.5:UNIFORM:NA', 'Direct:0.3:UNIFORM:NA', 'Referral:0.2:UNIFORM:NA') Types: str OR list of Strings (str) event_column: Required Argument. Specifies the name of an input teradataml DataFrame column that contains the clickstream events. Types: str timestamp_column: Required Argument. Specifies the name of an input teradataml DataFrame column that contains the timestamps of the clickstream events. Types: str window_size: Required Argument. Specifies how to determine the maximum window size for the attribution calculation: • rows:K: Consider the maximum number of events to be attributed, excluding events of types specified in excluding_event_table, which means assigning attributions to atmost K effective events before the current impact event. • seconds:K: Consider the maximum time difference between the current impact event and the earliest effective event to be attributed. • rows:K&seconds:K2: Consider both constraints and comply with the stricter one. Types: str conversion_data: Optional Argument. "conversion_data" is a required argument if "conversion_events" is not provided. Specifies the teradataml DataFrame that contains one varchar column (conversion_events) containing conversion event values. conversion_data_order_column: Optional Argument. Specifies Order By columns for conversion_data. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) optional_events: Optional Argument. Specifies the optional events. Each optional_event is a string or integer. An optional_event cannot be a conversion_event or exclude_event. The function attributes a conversion event to an optional event only if it cannot attribute it to a regular event. Types: str OR list of Strings (str) exclude_events: Optional Argument. Specifies the events to exclude from the attribution calculation. Each exclude_event is a string or integer. An exclude_event cannot be a conversion_event. Types: str OR list of Strings (str) data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) data_optional_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "data_optional". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) conversion_data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "conversion_data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) excluding_data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "excluding_data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) optional_data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "optional_data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) model1_type_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "model1_type". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) model2_type_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "model2_type". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) Note: • The Multiple-input Attribution takes data from multiple teradataml DataFrames ("data_optional", "conversion_data", "excluding_data", "optional_data", "model1_type" and "model2_type"). For Multiple-input Attribution, inputs "data", "conversion_data" and "model1_type" are required, where as other inputs are optional. The arguments "data_optional", "conversion_data", "excluding_data", "optional_data", "model1_type" and "model2_type" should be mentioned together and should not be used with arguments "conversion_events", "model1_name", "optional_events", "exclude_events" and "model2_name". For example, attribution = Attribution(data=< table | view | (query) >, data_partition_column='partition_column', data_order_column='order_column', data_optional=< table | view | (query) >, conversion_data=< table | view | (query) >, excluding_data=< table | view | (query) >, optional_data=< table | view | (query) >, model1_type=< table | view | (query) >, model2_type=< table | view | (query) >, event_column='event_column', timestamp_column='timestamp_column', window_size='rows:K | seconds:K | rows:K&seconds:K' ) • The Single-input Attribution takes data from single teradataml DataFrame ("data") and parameters come from arguments ("conversion_events", "model1_name", "optional_events", "exclude_events" and "model2_name"), not input teradataml DataFrames. For Single-input Attribution arguments "conversion_events" and "model1_name" are required, where as other single input syntax arguments are optional. The arguments "conversion_events", "model1_name", "optional_events", "exclude_events" and "model2_name" should be used together and should not be used with arguments "data_optional", "conversion_data", "excluding_data", "optional_data", "model1_type" and "model2_type". For example, attribution = Attribution(data=< table | view | (query) >, conversion_events = ['conversion_event', ...], timestamp_column=''timestamp_column'', model1_name = ['type', 'K' | 'EVENT:WEIGHT:MODEL:PARAMETERS', ...], model2_name = ['type', 'K' | 'EVENT:WEIGHT:MODEL:PARAMETERS', ...], event_column = "event_column" window_size = 'rows:K | seconds:K | rows:K&seconds:K', optional_events = ["organicsearch", "direct", "referral"], data_order_column='order_by_column' ) RETURNS: Instance of Attribution. Output teradataml DataFrames can be accessed using attribute references, such as AttributionObj.<attribute_name>. Output teradataml DataFrame attribute name is: result RAISES: TeradataMlException EXAMPLES: # Load example data. load_example_data("attribution", ["attribution_sample_table", "attribution_sample_table1", "attribution_sample_table2" , "conversion_event_table", "optional_event_table", "excluding_event_table", "model1_table", "model2_table"]) # Create TeradataML DataFrame objects. attribution_sample_table = DataFrame.from_table("attribution_sample_table") attribution_sample_table1 = DataFrame.from_table("attribution_sample_table1") attribution_sample_table2 = DataFrame.from_table("attribution_sample_table2") conversion_event_table = DataFrame.from_table("conversion_event_table") optional_event_table = DataFrame.from_table("optional_event_table") model1_table = DataFrame.from_table("model1_table") model2_table = DataFrame.from_table("model2_table") excluding_event_table = DataFrame("excluding_event_table") # Example 1 - One Regular Model, Multiple Optional Models. # This example specifes one distribution model for regular events # and one distribution model for each type of optional event. attribution_out1 = Attribution(data=attribution_sample_table, data_partition_column='user_id', conversion_events = ["socialnetwork", "paidsearch"], timestamp_column='time_stamp', model1_name = ["EVENT_REGULAR", "email:0.19:LAST_CLICK:NA","impression:0.81:UNIFORM:NA"], model2_name = ["EVENT_OPTIONAL", "organicsearch:0.5:UNIFORM:NA","direct:0.3:UNIFORM:NA", "referral:0.2:UNIFORM:NA"], event_column = "event", window_size = "rows:10&seconds:20", optional_events = ["organicsearch", "direct", "referral"], data_order_column='time_stamp' ) # Print the result print(attribution_out1.result) # Example 2 - Multiple Regular Models, One Optional Model. # This example specifes one distribution model for each type of regular # event and one distribution model for optional events. attribution_out2 = Attribution(data=attribution_sample_table, data_partition_column='user_id', conversion_events = ["socialnetwork", "paidsearch"], timestamp_column='time_stamp', model1_name = ["EVENT_REGULAR", "email:0.19:LAST_CLICK:NA","impression:0.81:UNIFORM:NA"], model2_name = ["EVENT_OPTIONAL", "ALL:1:EXPONENTIAL:0.5,ROW"], event_column = "event", window_size = "rows:10&seconds:20", optional_events = ["organicsearch", "direct", "referral"], data_order_column='time_stamp' ) # Print the result print(attribution_out2) # Example 3 - # This example uses Dynamic Weighted Distribution # Models Input. attribution_out3 = Attribution(data=attribution_sample_table, data_partition_column='user_id', conversion_events = ["socialnetwork", "paidsearch"], timestamp_column='time_stamp', model1_name = ["EVENT_REGULAR", "email:0.19:LAST_CLICK:NA","impression:0.81:WEIGHTED:0.4,0.3,0.2,0.1"], model2_name = ["EVENT_OPTIONAL", "ALL:1:WEIGHTED:0.4,0.3,0.2,0.1"], event_column = "event", window_size = "rows:10&seconds:20", optional_events = ["organicsearch", "direct", "referral"], data_order_column='time_stamp' ) # Print the result print(attribution_out3.result) # Example 4 - This example uses Window Models. attribution_out4 = Attribution(data=attribution_sample_table, data_partition_column='user_id', conversion_events = ["socialnetwork", "paidsearch"], timestamp_column='time_stamp', model1_name = ["SEGMENT_ROWS", "3:0.5:EXPONENTIAL:0.5,ROW","4:0.3:WEIGHTED:0.4,0.3,0.2,0.1", "3:0.2:FIRST_CLICK:NA"], model2_name = ["SEGMENT_SECONDS", "6:0.5:UNIFORM:NA", "8:0.3:LAST_CLICK:NA","6:0.2:FIRST_CLICK:NA"], event_column = "event", window_size = "rows:10&seconds:20", optional_events = ["organicsearch", "direct", "referral"], exclude_events = ["email"], data_order_column='time_stamp' ) # Print the result print(attribution_out4.result) # Example 5 - This example uses Single-Window Model. attribution_out5 = Attribution(data=attribution_sample_table, data_partition_column='user_id', conversion_events = ["socialnetwork", "paidsearch"], timestamp_column='time_stamp', model1_name = ["SIMPLE", "UNIFORM:NA"], event_column = "event", window_size = "rows:10&seconds:20", exclude_events = ["email"], data_order_column='time_stamp' ) # Print the result print(attribution_out5.result) # Example 6 - This example uses Unused Segment Windows. attribution_out6 = Attribution(data=attribution_sample_table, data_partition_column='user_id', conversion_events = ["socialnetwork", "paidsearch"], timestamp_column='time_stamp', model1_name = ["SEGMENT_ROWS", "3:0.5:EXPONENTIAL:0.5,ROW","4:0.3:WEIGHTED:0.4,0.3,0.2,0.1", "3:0.2:FIRST_CLICK:NA"], model2_name = ["SEGMENT_SECONDS", "6:0.5:UNIFORM:NA","8:0.3:LAST_CLICK:NA", "6:0.2:FIRST_CLICK:NA"], event_column = "event", window_size = "rows:10&seconds:20", data_order_column='time_stamp' ) # Print the result print(attribution_out6.result) # Example 7 - This example uses Multiple Inputs which takes data # and parameters from multiple tables and outputs attributions. attribution_out7 = Attribution(data=attribution_sample_table1, data_partition_column='user_id', data_order_column='time_stamp', data_optional=attribution_sample_table2, data_optional_partition_column='user_id', data_optional_order_column='time_stamp', conversion_data=conversion_event_table, excluding_data=excluding_event_table, optional_data=optional_event_table, model1_type=model1_table, model2_type=model2_table, event_column='event', timestamp_column='time_stamp', window_size='rows:10&seconds:20' ) # Print the result print(attribution_out7.result)

__repr__(self): Returns the string representation for a Attribution class instance.

get_build_time(self): Function to return the build time of the algorithm in seconds. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

get_prediction_type(self): Function to return the Prediction type of the algorithm. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

get_target_column(self): Function to return the Target Column of the algorithm. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

show_query(self): Function to return the underlying SQL query. When model object is created using retrieve_model(), then None is returned.