Teradata Package for Python Function Reference | 17.10 - Burst - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

teradataml.analytics.mle.Burst = class Burst(builtins.object)

Methods defined here:

__init__(self, data=None, time_data=None, time_column=None, value_columns=None, time_interval=None, time_datatype=None, value_datatype=None, start_time=None, end_time=None, num_points=None, values_before_first=None, values_after_last=None, split_criteria='nosplit', seed=None, accumulate=None, data_sequence_column=None, time_data_sequence_column=None, data_partition_column=None, time_data_partition_column=None, data_order_column=None, time_data_order_column=None): DESCRIPTION: The Burst function bursts (splits) a time interval into a series of shorter "burst" intervals and allocates values from the time intervals into the new, shorter subintervals. The Burst function is useful for allocating values from overlapping time intervals into user-defined time intervals (for example, when a cable company has customer data from overlapping time intervals, which it wants to analyze by dividing into uniform time intervals). The Burst function supports several allocation methods. PARAMETERS: data: Required Argument. Specifies the teradataml DataFrame name which contains time series. data_partition_column: Required Argument. Specifies Partition By columns for data. Values to this argument can be provided as list, if multiple columns are used for partition. Types: str OR list of Strings (str) data_order_column: Optional Argument. Specifies Order By columns for data. Values to this argument can be provided as list, if multiple columns are used for ordering. Types: str OR list of Strings (str) time_data: Optional Argument. Specifies the teradataml DataFrame name which contains time. time_data_partition_column: Optional Argument. Required if time_data is specified. Specifies Partition By columns for time_data. Values to this argument can be provided as list, if multiple columns are used for partition. Types: str OR list of Strings (str) time_data_order_column: Optional Argument. Specifies Order By columns for time_data. Values to this argument can be provided as list, if multiple columns are used for ordering. Types: str OR list of Strings (str) time_column: Required Argument. Specifies the names of the data teradataml DataFrame columns that contain the start and end times of the time interval to be burst. Types: str OR list of Strings (str) value_columns: Required Argument. Specifies the names of data teradataml DataFrame columns to copy to the output teradataml DataFrame. Types: str OR list of Strings (str) time_interval: Optional Argument. Specifies the length of each burst time interval. Note: Specify exactly one of time_data, time_interval, or num_points. Types: float time_datatype: Optional Argument. Specifies the data type of the output columns that correspond to the input teradataml DataFrame columns that time_column specifies (start_time_column and end_time_column). If you omit this argument, then the function infers the data type of start_time_column and end_time_column from the input teradataml DataFrame and uses the inferred data type for the corresponding output teradataml DataFrame columns. If you specify this argument, then the function can transform the input data to the specified output data type only if both the input column data type and the specified output column data type are in this list: int, float. Types: str value_datatype: Optional Argument. Specifies the data types of the output columns that correspond to the input teradataml DataFrame columns that value_columns specifies. If you omit this argument, then the function infers the data type of each value_column from the input teradataml DataFrame and uses the inferred data type for the corresponding output teradataml DataFrame column. If you specify value_datatype, then it must be the same size as value_columns. That is, if value_columns specifies n columns, then value_datatype must specify n data types. For i in [1, n], value_column_i has value_type_i. However, value_type_i can be empty; for example: value_columns (c1, c2, c3), value_datatype (int, ,str). If you specify this argument, then the function can transform the input data to the specified output data type only if both the input column data type and the specified output column data type are in this list: int, float. Types: str start_time: Optional Argument. Specifies the start time for the time interval to be burst. The default is the value in start_time_column. Types: str end_time: Optional Argument. Specifies the end time for the time interval to be burst. The default is the value in end_time_column. Types: str num_points: Optional Argument. Specifies the number of data points in each burst time interval. Note: Specify exactly one of time_data, time_interval, or num_points. Types: int values_before_first: Optional Argument. Specifies the values to use if start_time is before start_time_column. Each of these values must have the same data type as its corresponding value_column. Values of data type str are case-insensitive. If you specify values_before_first, then it must be the same size as value_columns. That is, if value_columns specifies n columns, then values_before_first must specify n values. For i in [1, n], value_column_i has the value before_first_value_i. However, before_first_value_i can be empty; for example: value_columns (c1, c2, c3), values_before_first (1, ,"abc"). If before_first_value_i is empty, then value_column_i has the value NULL. If you do not specify values_before_first, then value_column_i has the value NULL for i in [1, n]. Types: str values_after_last: Optional Argument. Specifies the values to use if end_time is after end_time_column. Each of these values must have the same data type as its corresponding value_column. Values of data type str are case-insensitive. If you specify values_after_last, then it must be the same size as value_columns. That is, if value_columns specifies n columns, then ValuesAfterLast must specify n values. For i in [1, n], value_column_i has the value after_last_value_i. However, after_last_value_i can be empty; for example: value.columns (c1, c2, c3), values_after_last (1, ,"abc"). If after_last_value_i is empty, then value_column_i has the value NULL. If you do not specify values_after_last, then value_column_i has the value NULL for i in [1, n]. Types: str split_criteria: Optional Argument. Specifies the split criteria of the value_columns. Default Value: "nosplit" Permitted Values: nosplit, proportional, random, gaussian, poisson Types: str seed: Optional Argument. Specifies the seed for the random number generator. Types: int accumulate: Optional Argument. Specifies the names of input_table columns (other than those specified by time_column and value_columns) to copy to the output teradataml DataFrame. By default, the function copies to the output teradataml DataFrame only the columns specified by time_column and value_columns. Types: str OR list of Strings (str) data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) time_data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "time_data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) RETURNS: Instance of Burst. Output teradataml DataFrames can be accessed using attribute references, such as BurstObj.<attribute_name>. Output teradataml DataFrame attribute name is: result RAISES: TeradataMlException EXAMPLES: # Load example data. load_example_data("burst", ["burst_data", "finance_data", "time_table2"]) # Create teradataml DataFrame objects. burst_data = DataFrame.from_table("burst_data") finance_data = DataFrame.from_table("finance_data") time_table2 = DataFrame.from_table("time_table2") # Example 1 - Use "time_interval" argument to burst the data for # a duration of 1 day (86400 seconds). Burst_out1 = Burst(data = burst_data, data_partition_column = ["id"], time_column = ["start_time_column", "end_time_column"], value_columns = ["num_custs"], time_interval = 86400.0, start_time = "08/01/2010", end_time = "08/10/2010", split_criteria = "nosplit", accumulate = ["id"] ) # Print the result DataFrame print(Burst_out1) # Example 2 - The "split_criteria" for the "value_column" used in # this example is proportional. Burst_out2 = Burst(data = burst_data, data_partition_column = ["id"], time_column = ["start_time_column", "end_time_column"], value_columns = ["num_custs"], time_interval = 86400.0, start_time = "08/01/2010", end_time = "08/10/2010", split_criteria = "proportional", accumulate = ["id"] ) # Print the result DataFrame print(Burst_out2.result) # Example 3 - The "split_criteria" for the "value_column" used in # this example is gaussian. Burst_out3 = Burst(data = burst_data, data_partition_column = ["id"], time_column = ["start_time_column", "end_time_column"], value_columns = ["num_custs"], time_interval = 86400.0, start_time = "08/01/2010", end_time = "08/10/2010", split_criteria = "gaussian", accumulate = ["id"] ) # Print the result DataFrame print(Burst_out3) # Example 4 - Uses a "time_data" argument, "values_before_first" # and "values"after_last". The "time_data" option allows the use of # different time intervals and partitions the data accordingly. Burst_out4 = Burst(data = finance_data, data_partition_column = ["id"], time_data = time_table2, time_data_partition_column = ["id"], time_column = ["start_time_column", "end_time_column"], value_columns = ["expenditure", "income", "investment"], start_time = "06/30/1967", end_time = "07/10/1967", values_before_first = ["NULL","NULL","NULL"], values_after_last = ["NULL","NULL","NULL"], accumulate = ["id"] ) # Print the result DataFrame print(Burst_out4)

__repr__(self): Returns the string representation for a Burst class instance.

get_build_time(self): Function to return the build time of the algorithm in seconds. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

get_prediction_type(self): Function to return the Prediction type of the algorithm. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

get_target_column(self): Function to return the Target Column of the algorithm. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

show_query(self): Function to return the underlying SQL query. When model object is created using retrieve_model(), then None is returned.