| |
Methods defined here:
- __init__(self, data=None, time_column=None, time_out=None, click_lag=None, emit_null=False, data_partition_column=None, data_order_column=None)
- DESCRIPTION:
The Sessionize function maps each click in a session to a unique
session identifier. A session is defined as a sequence of clicks by
one user that are separated by at most n seconds.
The function is useful both for sessionization and for detecting web
crawler (bot) activity. It is typically used to understand user browsing
behavior on a web site.
PARAMETERS:
data:
Required Argument.
Specifies the input teradataml DataFrame.
data_partition_column:
Required Argument.
Specifies Partition By columns for data.
Values to this argument can be provided as a list, if multiple columns
are used for partition.
Types: str OR list of Strings (str)
data_order_column:
Required Argument.
Specifies Order By columns for data.
Values to this argument can be provided as a list, if multiple columns
are used for ordering.
Types: str OR list of Strings (str)
time_column:
Required Argument.
Specifies the name of the input column that contains the click
times.
Note: The time_column must also be an data_order_column.
Types: str
time_out:
Required Argument.
Specifies the number of seconds at which the session times out. If
time_out seconds elapse after a click, then the next click
starts a new session.
Types: float
click_lag:
Optional Argument.
Specifies the minimum number of seconds between clicks for the
session user to be considered human. If clicks are more frequent,
indicating that the user is a "bot," the function ignores the
session. The click_lag must be less than time_out.
Types: float
emit_null:
Optional Argument.
Specifies whether to output rows that have None values in their
session id and rapid fire columns, even if their time_column has
a None value.
Default Value: False
Types: bool
RETURNS:
Instance of Sessionize.
Output teradataml DataFrames can be accessed using attribute
references, such as SessionizeObj.<attribute_name>.
Output teradataml DataFrame attribute name is:
result
RAISES:
TeradataMlException
EXAMPLES:
# Load the data to run the example.
load_example_data("Sessionize","sessionize_table")
# Create teradataml DataFram object.
sessionize_table = DataFrame.from_table("sessionize_table")
# Example 1 - This example maps each click in a session to a unique session identifer,
# which uses input table web clickstream data recorded as user navigates through a web site
# based on events — view, click, and so on which are recorded with a timestamp.
td_sessionize_out = Sessionize(data = sessionize_table,
data_partition_column = ["partition_id"],
data_order_column = ["clicktime"],
time_column = "clicktime",
time_out = 60.0,
click_lag = 0.2
)
# Print the result DataFrame
print(td_sessionize_out.result)
- __repr__(self)
- Returns the string representation for a Sessionize class instance.
- get_build_time(self)
- Function to return the build time of the algorithm in seconds.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_prediction_type(self)
- Function to return the Prediction type of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_target_column(self)
- Function to return the Target Column of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- show_query(self)
- Function to return the underlying SQL query.
When model object is created using retrieve_model(), then None is returned.
|