| |
Methods defined here:
- __init__(self, data=None, data_optional=None, conversion_events=None, excluding_data=None, optional_data=None, model1_type=None, model2_type=None, model1_name=None, model2_name=None, event_column=None, timestamp_column=None, window_size=None, conversion_data=None, optional_events=None, exclude_events=None, data_sequence_column=None, data_optional_sequence_column=None, conversion_data_sequence_column=None, excluding_data_sequence_column=None, optional_data_sequence_column=None, model1_type_sequence_column=None, model2_type_sequence_column=None, data_partition_column=None, data_optional_partition_column=None, data_order_column=None, data_optional_order_column=None, conversion_data_order_column=None, excluding_data_order_column=None, optional_data_order_column=None, model1_type_order_column=None, model2_type_order_column=None)
- DESCRIPTION:
The Attribution function is used in web page analysis, where it lets
companies assign weights to pages before certain events, such as
buying a product.
The function calculates attributions with a choice of distribution
models and has two versions:
• Multiple-input: Accepts one or more input tables and gets many
parameters from other dimension tables.
• Single-input: Accepts only one input table and gets all parameters
from argments.
Note: This function is available only when teradataml is connected to
Vantage 1.1 or later versions.
PARAMETERS:
data:
Required Argument.
Specifies the teradataml DataFrame that contains the click stream
data, which the function uses to compute attributions.
data_partition_column:
Required Argument.
Specifies Partition By columns for data.
Values to this argument can be provided as a list, if multiple
columns are used for partition.
Types: str OR list of Strings (str)
data_order_column:
Required Argument.
Specifies Order By columns for data.
Values to this argument can be provided as a list, if multiple
columns are used for ordering.
Types: str OR list of Strings (str)
data_optional:
Optional Argument.
Specifies the teradataml DataFrame that contains additional click
stream data, which cogroup attributes from all specified teradataml
DataFrame.
data_optional_partition_column:
Optional Argument. Required when 'data_optional' is used.
Specifies Partition By columns for data_optional.
Values to this argument can be provided as a list, if multiple
columns are used for partition.
Types: str OR list of Strings (str)
data_optional_order_column:
Optional Argument. Required when 'data_optional' is used.
Specifies Order By columns for data_optional.
Values to this argument can be provided as a list, if multiple
columns are used for ordering.
Types: str OR list of Strings (str)
conversion_events:
Optional Argument. "conversion_events" is a required argument if
"conversion_data" is not provided.
Specifies the conversion event value. Each conversion_event is
a string or integer.
Types: str OR list of Strings (str)
excluding_data:
Optional Argument.
Specifies the teradataml DataFrame that contains one varchar
column (excluding_events) containing excluding cause event values.
excluding_data_order_column:
Optional Argument.
Specifies Order By columns for excluding_data.
Values to this argument can be provided as a list, if multiple
columns are used for ordering.
Types: str OR list of Strings (str)
optional_data:
Optional Argument.
Specifies the teradataml DataFrame that contains one varchar
column (optional_events) containing optional cause event values.
optional_data_order_column:
Optional Argument.
Specifies Order By columns for optional_data.
Values to this argument can be provided as a list, if multiple
columns are used for ordering.
Types: str OR list of Strings (str)
model1_type:
Optional Argument. "model1_type" is a required argument if
"model1_name" is not provided.
Specifies the teradataml DataFrame that defines the type and
specification of the first model.
For example:
model1 data ("EVENT_REGULAR", "email:0.19:LAST_CLICK:NA",
"impression:0.81:WEIGHTED:0.4,0.3,0.2,0.1")
model1_type_order_column:
Optional Argument.
Specifies Order By columns for model1_type.
Values to this argument can be provided as a list, if multiple
columns are used for ordering.
Types: str OR list of Strings (str)
model2_type:
Optional Argument.
Specifies the teradataml DataFrame that defines the type and
distributions of the second model.
For example:
model2 data ("EVENT_OPTIONAL", "OrganicSearch:0.5:UNIFORM:NA",
"Direct:0.3:UNIFORM:NA", "Referral:0.2:UNIFORM:NA")
model2_type_order_column:
Optional Argument.
Specifies Order By columns for model2_type.
Values to this argument can be provided as a list, if multiple
columns are used for ordering.
Types: str OR list of Strings (str)
model1_name:
Optional Argument. "model1_name" is a required argument if
"model1_type" is not provided.
Specifies the type and specifcation of the first model.
For example:
Model1 ('EVENT_REGULAR', 'email:0.19:LAST_CLICK:NA',
'impression:0.81:WEIGHTED:0.4,0.3,0.2,0.1')
Types: str OR list of Strings (str)
model2_name:
Optional Argument.
Specifies the type and distributions of the second model.
For example:
Model2 ('EVENT_OPTIONAL', 'OrganicSearch:0.5:UNIFORM:NA',
'Direct:0.3:UNIFORM:NA', 'Referral:0.2:UNIFORM:NA')
Types: str OR list of Strings (str)
event_column:
Required Argument.
Specifies the name of an input teradataml DataFrame column that
contains the clickstream events.
Types: str
timestamp_column:
Required Argument.
Specifies the name of an input teradataml DataFrame column that
contains the timestamps of the clickstream events.
Types: str
window_size:
Required Argument.
Specifies how to determine the maximum window size for the
attribution calculation:
• rows:K: Consider the maximum number of events to be attributed,
excluding events of types specified in excluding_event_table,
which means assigning attributions to atmost K effective
events before the current impact event.
• seconds:K: Consider the maximum time difference between the
current impact event and the earliest effective event to
be attributed.
• rows:K&seconds:K2: Consider both constraints and comply with
the stricter one.
Types: str
conversion_data:
Optional Argument. "conversion_data" is a required argument if
"conversion_events" is not provided.
Specifies the teradataml DataFrame that contains one varchar
column (conversion_events) containing conversion event values.
conversion_data_order_column:
Optional Argument.
Specifies Order By columns for conversion_data.
Values to this argument can be provided as a list, if multiple
columns are used for ordering.
Types: str OR list of Strings (str)
optional_events:
Optional Argument.
Specifies the optional events. Each optional_event is a string or
integer. An optional_event cannot be a conversion_event or
exclude_event. The function attributes a conversion event to an
optional event only if it cannot attribute it to a regular event.
Types: str OR list of Strings (str)
exclude_events:
Optional Argument.
Specifies the events to exclude from the attribution calculation.
Each exclude_event is a string or integer. An exclude_event
cannot be a conversion_event.
Types: str OR list of Strings (str)
data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
data_optional_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "data_optional". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
conversion_data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "conversion_data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
excluding_data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "excluding_data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
optional_data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "optional_data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
model1_type_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "model1_type". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
model2_type_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "model2_type". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
Note:
• The Multiple-input Attribution takes data from multiple teradataml
DataFrames ("data_optional", "conversion_data", "excluding_data",
"optional_data", "model1_type" and "model2_type").
For Multiple-input Attribution, inputs "data", "conversion_data" and
"model1_type" are required, where as other inputs are optional.
The arguments "data_optional", "conversion_data", "excluding_data",
"optional_data", "model1_type" and "model2_type" should be mentioned
together and should not be used with arguments "conversion_events",
"model1_name", "optional_events", "exclude_events" and "model2_name".
For example,
attribution = Attribution(data=< table | view | (query) >,
data_partition_column='partition_column',
data_order_column='order_column',
data_optional=< table | view | (query) >,
conversion_data=< table | view | (query) >,
excluding_data=< table | view | (query) >,
optional_data=< table | view | (query) >,
model1_type=< table | view | (query) >,
model2_type=< table | view | (query) >,
event_column='event_column',
timestamp_column='timestamp_column',
window_size='rows:K | seconds:K | rows:K&seconds:K'
)
• The Single-input Attribution takes data from single teradataml
DataFrame ("data") and parameters come from arguments ("conversion_events",
"model1_name", "optional_events", "exclude_events" and "model2_name"),
not input teradataml DataFrames.
For Single-input Attribution arguments "conversion_events" and "model1_name"
are required, where as other single input syntax arguments are optional.
The arguments "conversion_events", "model1_name", "optional_events",
"exclude_events" and "model2_name" should be used together and
should not be used with arguments "data_optional", "conversion_data",
"excluding_data", "optional_data", "model1_type" and "model2_type".
For example,
attribution = Attribution(data=< table | view | (query) >,
conversion_events = ['conversion_event', ...],
timestamp_column=''timestamp_column'',
model1_name = ['type', 'K' | 'EVENT:WEIGHT:MODEL:PARAMETERS', ...],
model2_name = ['type', 'K' | 'EVENT:WEIGHT:MODEL:PARAMETERS', ...],
event_column = "event_column"
window_size = 'rows:K | seconds:K | rows:K&seconds:K',
optional_events = ["organicsearch", "direct", "referral"],
data_order_column='order_by_column'
)
RETURNS:
Instance of Attribution.
Output teradataml DataFrames can be accessed using attribute
references, such as AttributionObj.<attribute_name>.
Output teradataml DataFrame attribute name is:
result
RAISES:
TeradataMlException
EXAMPLES:
# Load example data.
load_example_data("attribution", ["attribution_sample_table",
"attribution_sample_table1", "attribution_sample_table2" ,
"conversion_event_table", "optional_event_table", "excluding_event_table",
"model1_table", "model2_table"])
# Create TeradataML DataFrame objects.
attribution_sample_table = DataFrame.from_table("attribution_sample_table")
attribution_sample_table1 = DataFrame.from_table("attribution_sample_table1")
attribution_sample_table2 = DataFrame.from_table("attribution_sample_table2")
conversion_event_table = DataFrame.from_table("conversion_event_table")
optional_event_table = DataFrame.from_table("optional_event_table")
model1_table = DataFrame.from_table("model1_table")
model2_table = DataFrame.from_table("model2_table")
excluding_event_table = DataFrame("excluding_event_table")
# Example 1 - One Regular Model, Multiple Optional Models.
# This example specifes one distribution model for regular events
# and one distribution model for each type of optional event.
attribution_out1 = Attribution(data=attribution_sample_table,
data_partition_column='user_id',
conversion_events = ["socialnetwork", "paidsearch"],
timestamp_column='time_stamp',
model1_name = ["EVENT_REGULAR",
"email:0.19:LAST_CLICK:NA","impression:0.81:UNIFORM:NA"],
model2_name = ["EVENT_OPTIONAL",
"organicsearch:0.5:UNIFORM:NA","direct:0.3:UNIFORM:NA",
"referral:0.2:UNIFORM:NA"],
event_column = "event",
window_size = "rows:10&seconds:20",
optional_events = ["organicsearch", "direct", "referral"],
data_order_column='time_stamp'
)
# Print the result
print(attribution_out1.result)
# Example 2 - Multiple Regular Models, One Optional Model.
# This example specifes one distribution model for each type of regular
# event and one distribution model for optional events.
attribution_out2 = Attribution(data=attribution_sample_table,
data_partition_column='user_id',
conversion_events = ["socialnetwork", "paidsearch"],
timestamp_column='time_stamp',
model1_name = ["EVENT_REGULAR",
"email:0.19:LAST_CLICK:NA","impression:0.81:UNIFORM:NA"],
model2_name = ["EVENT_OPTIONAL", "ALL:1:EXPONENTIAL:0.5,ROW"],
event_column = "event",
window_size = "rows:10&seconds:20",
optional_events = ["organicsearch", "direct", "referral"],
data_order_column='time_stamp'
)
# Print the result
print(attribution_out2)
# Example 3 - # This example uses Dynamic Weighted Distribution
# Models Input.
attribution_out3 = Attribution(data=attribution_sample_table,
data_partition_column='user_id',
conversion_events = ["socialnetwork", "paidsearch"],
timestamp_column='time_stamp',
model1_name = ["EVENT_REGULAR",
"email:0.19:LAST_CLICK:NA","impression:0.81:WEIGHTED:0.4,0.3,0.2,0.1"],
model2_name = ["EVENT_OPTIONAL", "ALL:1:WEIGHTED:0.4,0.3,0.2,0.1"],
event_column = "event",
window_size = "rows:10&seconds:20",
optional_events = ["organicsearch", "direct", "referral"],
data_order_column='time_stamp'
)
# Print the result
print(attribution_out3.result)
# Example 4 - This example uses Window Models.
attribution_out4 = Attribution(data=attribution_sample_table,
data_partition_column='user_id',
conversion_events = ["socialnetwork", "paidsearch"],
timestamp_column='time_stamp',
model1_name = ["SEGMENT_ROWS",
"3:0.5:EXPONENTIAL:0.5,ROW","4:0.3:WEIGHTED:0.4,0.3,0.2,0.1",
"3:0.2:FIRST_CLICK:NA"],
model2_name = ["SEGMENT_SECONDS", "6:0.5:UNIFORM:NA",
"8:0.3:LAST_CLICK:NA","6:0.2:FIRST_CLICK:NA"],
event_column = "event",
window_size = "rows:10&seconds:20",
optional_events = ["organicsearch", "direct", "referral"],
exclude_events = ["email"],
data_order_column='time_stamp'
)
# Print the result
print(attribution_out4.result)
# Example 5 - This example uses Single-Window Model.
attribution_out5 = Attribution(data=attribution_sample_table,
data_partition_column='user_id',
conversion_events = ["socialnetwork", "paidsearch"],
timestamp_column='time_stamp',
model1_name = ["SIMPLE", "UNIFORM:NA"],
event_column = "event",
window_size = "rows:10&seconds:20",
exclude_events = ["email"],
data_order_column='time_stamp'
)
# Print the result
print(attribution_out5.result)
# Example 6 - This example uses Unused Segment Windows.
attribution_out6 = Attribution(data=attribution_sample_table,
data_partition_column='user_id',
conversion_events = ["socialnetwork", "paidsearch"],
timestamp_column='time_stamp',
model1_name = ["SEGMENT_ROWS",
"3:0.5:EXPONENTIAL:0.5,ROW","4:0.3:WEIGHTED:0.4,0.3,0.2,0.1",
"3:0.2:FIRST_CLICK:NA"],
model2_name = ["SEGMENT_SECONDS",
"6:0.5:UNIFORM:NA","8:0.3:LAST_CLICK:NA", "6:0.2:FIRST_CLICK:NA"],
event_column = "event",
window_size = "rows:10&seconds:20",
data_order_column='time_stamp'
)
# Print the result
print(attribution_out6.result)
# Example 7 - This example uses Multiple Inputs which takes data
# and parameters from multiple tables and outputs attributions.
attribution_out7 = Attribution(data=attribution_sample_table1,
data_partition_column='user_id',
data_order_column='time_stamp',
data_optional=attribution_sample_table2,
data_optional_partition_column='user_id',
data_optional_order_column='time_stamp',
conversion_data=conversion_event_table,
excluding_data=excluding_event_table,
optional_data=optional_event_table,
model1_type=model1_table,
model2_type=model2_table,
event_column='event',
timestamp_column='time_stamp',
window_size='rows:10&seconds:20'
)
# Print the result
print(attribution_out7.result)
- __repr__(self)
- Returns the string representation for a Attribution class instance.
- get_build_time(self)
- Function to return the build time of the algorithm in seconds.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_prediction_type(self)
- Function to return the Prediction type of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_target_column(self)
- Function to return the Target Column of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- show_query(self)
- Function to return the underlying SQL query.
When model object is created using retrieve_model(), then None is returned.
|