| |
Methods defined here:
- __init__(self, data=None, time_data=None, time_column=None, value_columns=None, time_interval=None, time_datatype=None, value_datatype=None, start_time=None, end_time=None, num_points=None, values_before_first=None, values_after_last=None, split_criteria='nosplit', seed=None, accumulate=None, data_sequence_column=None, time_data_sequence_column=None, data_partition_column=None, time_data_partition_column=None, data_order_column=None, time_data_order_column=None)
- DESCRIPTION:
The Burst function bursts (splits) a time interval into a series of
shorter "burst" intervals and allocates values from the time
intervals into the new, shorter subintervals. The Burst function is
useful for allocating values from overlapping time intervals into
user-defined time intervals (for example, when a cable company has
customer data from overlapping time intervals, which it wants to
analyze by dividing into uniform time intervals). The Burst function
supports several allocation methods.
PARAMETERS:
data:
Required Argument.
Specifies the teradataml DataFrame name which contains time
series.
data_partition_column:
Required Argument.
Specifies Partition By columns for data.
Values to this argument can be provided as list, if multiple columns
are used for partition.
Types: str OR list of Strings (str)
data_order_column:
Optional Argument.
Specifies Order By columns for data.
Values to this argument can be provided as list, if multiple columns
are used for ordering.
Types: str OR list of Strings (str)
time_data:
Optional Argument.
Specifies the teradataml DataFrame name which contains time.
time_data_partition_column:
Optional Argument. Required if time_data is specified.
Specifies Partition By columns for time_data.
Values to this argument can be provided as list, if multiple columns
are used for partition.
Types: str OR list of Strings (str)
time_data_order_column:
Optional Argument.
Specifies Order By columns for time_data.
Values to this argument can be provided as list, if multiple columns
are used for ordering.
Types: str OR list of Strings (str)
time_column:
Required Argument.
Specifies the names of the data teradataml DataFrame columns
that contain the start and end times of the time interval to be
burst.
Types: str OR list of Strings (str)
value_columns:
Required Argument.
Specifies the names of data teradataml DataFrame columns to
copy to the output teradataml DataFrame.
Types: str OR list of Strings (str)
time_interval:
Optional Argument.
Specifies the length of each burst time interval.
Note: Specify exactly one of time_data, time_interval, or
num_points.
Types: float
time_datatype:
Optional Argument.
Specifies the data type of the output columns that correspond to the
input teradataml DataFrame columns that time_column specifies
(start_time_column and end_time_column). If you omit this argument,
then the function infers the data type of start_time_column and
end_time_column from the input teradataml DataFrame and uses the
inferred data type for the corresponding output teradataml DataFrame
columns. If you specify this argument, then the function can
transform the input data to the specified output data type only if
both the input column data type and the specified output column data
type are in this list: int, float.
Types: str
value_datatype:
Optional Argument.
Specifies the data types of the output columns that correspond
to the input teradataml DataFrame columns that value_columns
specifies. If you omit this argument, then the function infers
the data type of each value_column from the input teradataml
DataFrame and uses the inferred data type for the corresponding
output teradataml DataFrame column. If you specify value_datatype,
then it must be the same size as value_columns. That is, if
value_columns specifies n columns, then value_datatype must
specify n data types. For i in [1, n], value_column_i has
value_type_i. However, value_type_i can be empty; for example:
value_columns (c1, c2, c3), value_datatype (int, ,str).
If you specify this argument, then the function can transform
the input data to the specified output data type only if both
the input column data type and the specified output column data
type are in this list: int, float.
Types: str
start_time:
Optional Argument.
Specifies the start time for the time interval to be burst. The
default is the value in start_time_column.
Types: str
end_time:
Optional Argument.
Specifies the end time for the time interval to be burst. The default
is the value in end_time_column.
Types: str
num_points:
Optional Argument.
Specifies the number of data points in each burst time interval.
Note: Specify exactly one of time_data, time_interval, or num_points.
Types: int
values_before_first:
Optional Argument.
Specifies the values to use if start_time is before start_time_column.
Each of these values must have the same data type as its corresponding
value_column. Values of data type str are case-insensitive.
If you specify values_before_first, then it must be the same size as
value_columns. That is, if value_columns specifies n columns,
then values_before_first must specify n values. For i in [1,
n], value_column_i has the value before_first_value_i. However,
before_first_value_i can be empty; for example: value_columns (c1,
c2, c3), values_before_first (1, ,"abc"). If before_first_value_i
is empty, then value_column_i has the value NULL. If you do not
specify values_before_first, then value_column_i has the value
NULL for i in [1, n].
Types: str
values_after_last:
Optional Argument.
Specifies the values to use if end_time is after end_time_column.
Each of these values must have the same data type as its
corresponding value_column. Values of data type str are
case-insensitive. If you specify values_after_last, then it
must be the same size as value_columns. That is, if value_columns
specifies n columns, then ValuesAfterLast must specify n values.
For i in [1, n], value_column_i has the value after_last_value_i.
However, after_last_value_i can be empty; for example:
value.columns (c1, c2, c3), values_after_last (1, ,"abc").
If after_last_value_i is empty, then value_column_i has the
value NULL. If you do not specify values_after_last, then
value_column_i has the value NULL for i in [1, n].
Types: str
split_criteria:
Optional Argument.
Specifies the split criteria of the value_columns.
Default Value: "nosplit"
Permitted Values: nosplit, proportional, random, gaussian, poisson
Types: str
seed:
Optional Argument.
Specifies the seed for the random number generator.
Types: int
accumulate:
Optional Argument.
Specifies the names of input_table columns (other than those
specified by time_column and value_columns) to copy to the output
teradataml DataFrame. By default, the function copies to the
output teradataml DataFrame only the columns specified by
time_column and value_columns.
Types: str OR list of Strings (str)
data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "data". The argument is used to ensure
deterministic results for functions which produce results that
vary from run to run.
Types: str OR list of Strings (str)
time_data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "time_data". The argument is used to ensure
deterministic results for functions which produce results that
vary from run to run.
Types: str OR list of Strings (str)
RETURNS:
Instance of Burst.
Output teradataml DataFrames can be accessed using attribute
references, such as BurstObj.<attribute_name>.
Output teradataml DataFrame attribute name is:
result
RAISES:
TeradataMlException
EXAMPLES:
# Load example data.
load_example_data("burst", ["burst_data", "finance_data", "time_table2"])
# Create teradataml DataFrame objects.
burst_data = DataFrame.from_table("burst_data")
finance_data = DataFrame.from_table("finance_data")
time_table2 = DataFrame.from_table("time_table2")
# Example 1 - Use "time_interval" argument to burst the data for
# a duration of 1 day (86400 seconds).
Burst_out1 = Burst(data = burst_data,
data_partition_column = ["id"],
time_column = ["start_time_column", "end_time_column"],
value_columns = ["num_custs"],
time_interval = 86400.0,
start_time = "08/01/2010",
end_time = "08/10/2010",
split_criteria = "nosplit",
accumulate = ["id"]
)
# Print the result DataFrame
print(Burst_out1)
# Example 2 - The "split_criteria" for the "value_column" used in
# this example is proportional.
Burst_out2 = Burst(data = burst_data,
data_partition_column = ["id"],
time_column = ["start_time_column", "end_time_column"],
value_columns = ["num_custs"],
time_interval = 86400.0,
start_time = "08/01/2010",
end_time = "08/10/2010",
split_criteria = "proportional",
accumulate = ["id"]
)
# Print the result DataFrame
print(Burst_out2.result)
# Example 3 - The "split_criteria" for the "value_column" used in
# this example is gaussian.
Burst_out3 = Burst(data = burst_data,
data_partition_column = ["id"],
time_column = ["start_time_column", "end_time_column"],
value_columns = ["num_custs"],
time_interval = 86400.0,
start_time = "08/01/2010",
end_time = "08/10/2010",
split_criteria = "gaussian",
accumulate = ["id"]
)
# Print the result DataFrame
print(Burst_out3)
# Example 4 - Uses a "time_data" argument, "values_before_first"
# and "values"after_last". The "time_data" option allows the use of
# different time intervals and partitions the data accordingly.
Burst_out4 = Burst(data = finance_data,
data_partition_column = ["id"],
time_data = time_table2,
time_data_partition_column = ["id"],
time_column = ["start_time_column", "end_time_column"],
value_columns = ["expenditure", "income", "investment"],
start_time = "06/30/1967",
end_time = "07/10/1967",
values_before_first = ["NULL","NULL","NULL"],
values_after_last = ["NULL","NULL","NULL"],
accumulate = ["id"]
)
# Print the result DataFrame
print(Burst_out4)
- __repr__(self)
- Returns the string representation for a Burst class instance.
- get_build_time(self)
- Function to return the build time of the algorithm in seconds.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_prediction_type(self)
- Function to return the Prediction type of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_target_column(self)
- Function to return the Target Column of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- show_query(self)
- Function to return the underlying SQL query.
When model object is created using retrieve_model(), then None is returned.
|