Teradata Python Package Function Reference - ChangePointDetection - Teradata Python Package - Look here for syntax, methods and examples for the functions included in the Teradata Python Package.

teradataml.analytics.mle.ChangePointDetection = class ChangePointDetection(builtins.object)

Methods defined here:

__init__(self, data=None, value_column=None, accumulate=None, segmentation_method='normal_distribution', search_method='binary', max_change_num=10, penalty='BIC', output_option='changepoint', data_sequence_column=None, data_partition_column=None, data_order_column=None): DESCRIPTION: The ChangePointDetection function detects change points in a stochastic process or time series, using retrospective change-point detection, implemented with these algorithms: • Search algorithm: binary search • Segmentation algorithm: normal distribution and linear regression PARAMETERS: data: Required Argument. Specifies the teradataml DataFrame containing the input time series data. data_partition_column: Required Argument. Specifies Partition By columns for data. Values to this argument can be provided as list, if multiple columns are used for partition. Types: str OR list of Strings (str) data_order_column: Required Argument. Specifies Order By columns for data. Values to this argument can be provided as list, if multiple columns are used for ordering. Types: str OR list of Strings (str) value_column: Required Argument. Specifies the name of the input teradataml DataFrame column that contains the time series data. Types: str accumulate: Required Argument. Specifies the names of the input teradataml DataFrame columns to copy to the output table. Tip: To identify change points in the output table, specify the columns that appear in data_partition_column and data_order_column. Types: str OR list of Strings (str) segmentation_method: Optional Argument. Specifies one of these segmentation methods: • normal_distribution : In each segment, the data is in a normal distribution. • linear_regression: In each segment, the data is in linear regression. Default Value: normal_distribution Permitted Values: normal_distribution, linear_regression Types: str search_method: Optional Argument. Specifies the search method, binary segmentation. Default Value: binary Permitted Values: binary Types: str max_change_num: Optional Argument. Specifies the maximum number of change points to detect. Default Value: 10 Types: int penalty: Optional Argument. Specifies the penalty function, which is used to avoid over-fitting. Possible values are: BIC , AIC and threshold (a float value). • For BIC, the condition for the existence of a change point is: ln(L1) - ln(L0) > (p1 - p0) * ln(n)/2. For normal distribution and linear regression, the condition is: (p1 - p0) * ln(n)/2 = ln(n). • For AIC, the condition for the existence of a change point is: ln(L1) - ln(L0) > p1 - p0. For normal distribution and linear regression, the condition is: p1 - p0 = 2. • For threshold, the specified value is compared to: ln(L1) - ln(L0). L1 and L0 are the maximum likelihood estimation of hypotheses H1 and H0. For normal distribution, the definition of Log(L1) and Log(L0) are in "Background". 'p' is the number of additional parameters introduced by adding a change point. 'p' is used in the information criterion BIC or AIC. p1 and p0 represent this parameter in hypotheses H1 and H0 separately. Default Value: BIC Types: str output_option: Optional Argument. Specifies the output teradataml DataFrame columns. Default Value: changepoint Permitted Values: changepoint, segment, verbose Types: str data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) RETURNS: Instance of ChangePointDetection. Output teradataml DataFrames can be accessed using attribute references, such as ChangePointDetectionObj.<attribute_name>. Output teradataml DataFrame attribute name is: result RAISES: TeradataMlException EXAMPLES: # Load the data to run the example. load_example_data('changepointdetection', ['cpt', 'finance_data2']) # Provided example tables are 'cpt' and 'finance_data2'. # These input tables contain time series data like expenditure, # income between time periods or power consumption at certain # periods or sequence or pulserate etc # Create teradataml DataFrame objects. cpt_table = DataFrame.from_table('cpt') print(cpt_table) # Only 10 rows are displayed by default # Example 1: (Using default parameters) cpt_out = ChangePointDetection( data = cpt_table, data_partition_column = 'sid', data_order_column = 'id', value_column = 'val', accumulate = ['sid','id'] ) # Print the results print(cpt_out.result) # Example 2: (Using 'VERBOSE' output_option) cpt_out = ChangePointDetection( data = cpt_table, data_partition_column = 'sid', data_order_column = 'id', value_column = 'val', accumulate = ['sid', 'id'], output_option = 'verbose' ) # Print the results print(cpt_out.result) # Example 3: (Using 'AIC' penalty) cpt_out = ChangePointDetection( data = cpt_table, data_partition_column = 'sid', data_order_column = 'id', value_column = 'val', accumulate = ['sid', 'id'], penalty = 'AIC' ) # Print the results print(cpt_out.result) # Example 4: (Using 'threshold' penalty of 20) cpt_out = ChangePointDetection( data = cpt_table, data_partition_column = 'sid', data_order_column = 'id', value_column = 'val', accumulate = ['sid', 'id'], penalty = '20.0' ) # Print the results print(cpt_out.result) # Example 5: (Using 'linear_regression' segmentation_method) cpt_out = ChangePointDetection( data = cpt_table, data_partition_column = 'sid', data_order_column = 'id', value_column = 'val', accumulate = ['sid', 'id'], segmentation_method = 'linear_regression' ) # Print the results print(cpt_out.result) # Example 6 : (Using 'linear_regression' segmentation_method and 'SEGMENT' # output_option) cpt_out = ChangePointDetection( data = cpt_table, data_partition_column = 'sid', data_order_column = 'id', value_column = 'val', accumulate = ['sid', 'id'], segmentation_method = 'linear_regression', output_option = 'segment' ) # Print the results print(cpt_out.result)

__repr__(self): Returns the string representation for a ChangePointDetection class instance.