| |
Methods defined here:
- __init__(self, data=None, value_column=None, accumulate=None, segmentation_method='normal_distribution', search_method='binary', max_change_num=10, penalty='BIC', output_option='changepoint', data_sequence_column=None, data_partition_column=None, data_order_column=None, granularity=1)
- DESCRIPTION:
The ChangePointDetection function detects change points in a
stochastic process or time series, using retrospective change-point
detection, implemented with these algorithms:
* Search algorithm: binary search
* Segmentation algorithm: normal distribution and linear regression
PARAMETERS:
data:
Required Argument.
Specifies the teradataml DataFrame containing the input time
series data.
data_partition_column:
Required Argument.
Specifies Partition By columns for "data".
Values to this argument can be provided as a list, if multiple
columns are used for partition.
Types: str OR list of Strings (str)
data_order_column:
Required Argument.
Specifies Order By columns for "data".
Values to this argument can be provided as a list, if multiple
columns are used for ordering.
Types: str OR list of Strings (str)
value_column:
Required Argument.
Specifies the name of the input teradataml DataFrame column
that contains the time series data.
Types: str
accumulate:
Optional Argument. Required when teradataml is connected to
Vantage 1.0 Maintenance Update 2.
Specifies the names of the input teradataml DataFrame columns
to copy to the output table.
Tip: To identify change points in the output table, specify
the columns that appear in data_partition_column and
data_order_column.
Types: str OR list of Strings (str)
segmentation_method:
Optional Argument.
Specifies one of these segmentation methods:
* normal_distribution : In each segment, the data is in a
normal distribution.
* linear_regression: In each segment, the data is in linear
regression.
Default Value: normal_distribution
Permitted Values: normal_distribution, linear_regression
Types: str
search_method:
Optional Argument.
Specifies the search method, binary segmentation.
Default Value: binary
Permitted Values: binary
Types: str
max_change_num:
Optional Argument.
Specifies the maximum number of change points to detect.
Default Value: 10
Types: int
penalty:
Optional Argument.
Specifies the penalty function, which is used to avoid
over-fitting.
Possible values are: BIC , AIC and threshold (a float value).
* For BIC, the condition for the existence of a change point
is: ln(L1) - ln(L0) > (p1 - p0) * ln(n)/2.
For normal distribution and linear regression, the condition
is: (p1 - p0) * ln(n)/2 = ln(n).
* For AIC, the condition for the existence of a change point
is: ln(L1) - ln(L0) > p1 - p0.
For normal distribution and linear regression, the condition
is: p1 - p0 = 2.
* For threshold, the specified value is compared to:
ln(L1) - ln(L0).
L1 and L0 are the maximum likelihood estimation of hypotheses
H1 and H0. For normal distribution, the definition of
Log(L1) and Log(L0) are in "Background". 'p' is the number of
additional parameters introduced by adding a change point. 'p'
is used in the information criterion BIC or AIC. p1 and p0
represent this parameter in hypotheses H1 and H0 separately.
Default Value: BIC
Types: str
output_option:
Optional Argument.
Specifies the output teradataml DataFrame columns.
Default Value: changepoint
Permitted Values: changepoint, segment, verbose
Types: str
data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each
row of the input argument "data". The argument is used to
ensure deterministic results for functions which produce
results that vary from run to run.
Types: str OR list of Strings (str)
granularity:
Optional Argument.
Specifies the difference between index of consecutive candidate
change points.
Note:
"granularity" argument support is only available when teradataml
is connected to Vantage 1.3 version.
Default Value: 1
Types: int
RETURNS:
Instance of ChangePointDetection.
Output teradataml DataFrames can be accessed using attribute
references, such as ChangePointDetectionObj.<attribute_name>.
Output teradataml DataFrame attribute name is:
result
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLES:
# Load the data to run the example.
load_example_data('changepointdetection', ['cpt', 'finance_data2'])
# Provided example tables are 'cpt' and 'finance_data2'.
# These input tables contain time series data like expenditure,
# income between time periods or power consumption at certain
# periods or sequence or pulserate etc
# Create teradataml DataFrame objects.
cpt_table = DataFrame.from_table('cpt')
print(cpt_table) # Only 10 rows are displayed by default
# Example 1: (Using default parameters)
cpt_out = ChangePointDetection( data = cpt_table,
data_partition_column = 'sid',
data_order_column = 'id',
value_column = 'val',
accumulate = ['sid','id']
)
# Print the results
print(cpt_out.result)
# Example 2: (Using 'VERBOSE' output_option)
cpt_out = ChangePointDetection( data = cpt_table,
data_partition_column = 'sid',
data_order_column = 'id',
value_column = 'val',
accumulate = ['sid', 'id'],
output_option = 'verbose'
)
# Print the results
print(cpt_out.result)
# Example 3: (Using 'AIC' penalty)
cpt_out = ChangePointDetection( data = cpt_table,
data_partition_column = 'sid',
data_order_column = 'id',
value_column = 'val',
accumulate = ['sid', 'id'],
penalty = 'AIC'
)
# Print the results
print(cpt_out.result)
# Example 4: (Using 'threshold' penalty of 20)
cpt_out = ChangePointDetection( data = cpt_table,
data_partition_column = 'sid',
data_order_column = 'id',
value_column = 'val',
accumulate = ['sid', 'id'],
penalty = '20.0'
)
# Print the results
print(cpt_out.result)
# Example 5: (Using 'linear_regression' segmentation_method)
cpt_out = ChangePointDetection( data = cpt_table,
data_partition_column = 'sid',
data_order_column = 'id',
value_column = 'val',
accumulate = ['sid', 'id'],
segmentation_method = 'linear_regression'
)
# Print the results
print(cpt_out.result)
# Example 6 : (Using 'linear_regression' segmentation_method and 'SEGMENT'
# output_option)
cpt_out = ChangePointDetection( data = cpt_table,
data_partition_column = 'sid',
data_order_column = 'id',
value_column = 'val',
accumulate = ['sid', 'id'],
segmentation_method = 'linear_regression',
output_option = 'segment'
)
# Print the results
print(cpt_out.result)
- __repr__(self)
- Returns the string representation for a ChangePointDetection class instance.
- get_build_time(self)
- Function to return the build time of the algorithm in seconds.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_prediction_type(self)
- Function to return the Prediction type of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_target_column(self)
- Function to return the Target Column of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- show_query(self)
- Function to return the underlying SQL query.
When model object is created using retrieve_model(), then None is returned.
|