Teradata Python Package Function Reference - PathStart - Teradata Python Package - Look here for syntax, methods and examples for the functions included in the Teradata Python Package.

teradataml.analytics.mle.PathStart = class PathStart(builtins.object)

Methods defined here:

__init__(self, object=None, count_column=None, delimiter=',', parent_column=None, partition_names=None, node_column=None, object_sequence_column=None, object_partition_column=None, object_order_column=None): DESCRIPTION: The PathStart function takes output of the function PathSummarizer and returns, for each parent in the input teradataml DataFrame, the parent and children and the number of times that each of its sub-sequences was traveled. PARAMETERS: object: Required Argument. The name of the teradataml DataFrame containing the input data. object_partition_column: Required Argument. Specifies Partition By columns for object. Values to this argument can be provided as list, if multiple columns are used for partition. Types: str OR list of Strings (str) object_order_column: Optional Argument. Specifies Order By columns for object. Values to this argument can be provided as list, if multiple columns are used for ordering. Types: str OR list of Strings (str) count_column: Required Argument. Specifies the name of the input teradataml DataFrame column that contains the number of times a path was traveled. Types: str delimiter: Optional Argument. Specifies the single-character delimiter that separates symbols in the path string. Note: Do not use any of the following characters as delimiter (they cause the function to fail): Asterisk (*), Plus (+), Left parenthesis ((), Right parenthesis ()), Single quotation mark ('), Escaped single quotation mark (\'), Backslash (\). Default Value: "," Types: str parent_column: Required Argument. Specifies the name of the input teradataml DataFrame column that contains the parent nodes. The object_partition_column argument in the function call must include this column. Types: str partition_names: Required Argument. Lists the names of the columns that the object_partition_column argument specifies. The function uses these names for output teradataml DataFrame columns. This argument and the object_partition_column argument must specify the same names in the same order. One object_partition_column must be parent_column. Types: str OR list of strs node_column: Required Argument. Specifies the name of the input teradataml DataFrame column that contains the nodes. Types: str object_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "object". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) RETURNS: Instance of PathStart. Output teradataml DataFrames can be accessed using attribute references, such as PathStartObj.<attribute_name>. Output teradataml DataFrame attribute name is: result RAISES: TeradataMlException EXAMPLES: # Load example data. load_example_data("pathgenerator", "clickstream1") # Create teradataml DataFrame objects. # The table contains clickstream data, where the "path" column # contains symbols for the pages that the customer clicked. clickstream1 = DataFrame.from_table("clickstream1") # Example 1 - PathStart uses the output of PathSummarizer. PathGeneratorOut = PathGenerator(data = clickstream1, seq_column = "path" ) PathSummarizerOut = PathSummarizer(object = PathGeneratorOut, object_partition_column = ['prefix'], seq_column = 'sequence', partition_names = 'prefix', prefix_column = 'prefix' ) PathStartOut1 = PathStart(object=PathSummarizerOut, object_partition_column='parent', node_column='node', parent_column='parent', count_column='cnt', partition_names='partitioned' ) # Print the results print(PathStartOut1) # Example 2 - Alternatively, persist and use the output table of PathSummarizer. copy_to_sql(PathSummarizerOut.result, "generated_summarized_path_table") generated_summarized_path_table = DataFrame.from_table("generated_summarized_path_table") PathStartOut2 = PathStart(object=generated_summarized_path_table, object_partition_column='parent', node_column='node', parent_column='parent', count_column='cnt', partition_names='partitioned' ) # Print the results print(PathStartOut2)

__repr__(self): Returns the string representation for a PathStart class instance.