Teradata Package for Python Function Reference | 17.10 - SAX - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

teradataml.analytics.mle.SAX = class SAX(builtins.object)

Methods defined here:

__init__(self, data=None, meanstats_data=None, stdevstats_data=None, value_columns=None, time_column=None, window_type='global', output='string', mean=None, st_dev=None, window_size=None, output_frequency=1, points_persymbol=1, symbols_perwindow=None, alphabet_size=4, bitmap_level=2, print_stats=False, accumulate=None, data_sequence_column=None, meanstats_data_sequence_column=None, stdevstats_data_sequence_column=None, data_partition_column=None, meanstats_data_partition_column=None, stdevstats_data_partition_column=None, data_order_column=None, meanstats_data_order_column=None, stdevstats_data_order_column=None): DESCRIPTION: The SAX (Symbolic Aggregate approXimation) function transforms a time series data item into a smaller sequence of symbols, which are more suitable for additional types of manipulation, because of their smaller size and the relative ease with which patterns can be identified and compared. Input and output formats allow it to be analyzed using NPath or Shapelet Functions, or by other hashing or regular-expression pattern matching algorithms. PARAMETERS: data: Required Argument. Specifies the teradataml DataFrame containing timeseries data. data_partition_column: Required Argument. Specifies Partition By columns for data. Values to this argument can be provided as list, if multiple columns are used for partition. Types: str OR list of Strings (str) data_order_column: Required Argument. Specifies Order By columns for data. Values to this argument can be provided as list, if multiple columns are used for ordering. Types: str OR list of Strings (str) meanstats_data: Optional Argument. Specifies teradataml DataFrame that contains the global means of each value_column of the input teradataml DataFrame. meanstats_data_partition_column: Optional Argument. Required if 'meanstats_data' is used. Specifies Partition By columns for meanstats_data. Values to this argument can be provided as list, if multiple columns are used for partition. Types: str OR list of Strings (str) meanstats_data_order_column: Optional Argument. Specifies Order By columns for meanstats_data. Values to this argument can be provided as list, if multiple columns are used for ordering. Types: str OR list of Strings (str) stdevstats_data: Optional Argument. Specifies teradataml DataFrame that contains the global standard deviations of each value_column of the input teradataml DataFrame. stdevstats_data_partition_column: Optional Argument. Required if 'stdevstats_data' is used. Specifies Partition By columns for stdevstats_data. Values to this argument can be provided as list, if multiple columns are used for partition. Types: str OR list of Strings (str) stdevstats_data_order_column: Optional Argument. Specifies Order By columns for stdevstats_data. Values to this argument can be provided as list, if multiple columns are used for ordering. Types: str OR list of Strings (str) value_columns: Required Argument. Specifies the names of the input teradataml DataFrame columns that contain the time series data to be transformed. Types: str OR list of Strings (str) time_column: Optional Argument. Specifies the name of the input teradataml DataFrame column that contains the time axis of the data. Types: str window_type: Optional Argument. Determines how much data the function processes at one time: "global": The function computes the SAX code using a single mean and standard deviation for the entire data set. "sliding": The function recomputes the mean and standard deviation for a sliding window of the data set. Default Value: "global" Permitted Values: sliding, global Types: str output: Optional Argument. Determines how the function outputs the results: "string": The function outputs a list of SAX codes for each window. "bytes": The function outputs the list of SAX codes as compact byte arrays (which are not "human-readable"). "bitmap": The function outputs a JSON representation of a SAX bitmap. "characters": The function outputs one character for each line. Default Value: "string" Permitted Values: STRING, BITMAP, BYTES, CHARACTERS Types: str mean: Optional Argument. Specifies the global mean values that the function uses to calculate the SAX code for every partition. A mean value has the data type float. If mean specifies only one value and value_columns specifies multiple columns, then the specified value applies to every value_column. If mean specifies multiple values, then it must specify a value for each value_column. The nth mean value corresponds to the nth value_column. Tip: To specify a different global mean value for each partition, use the multiple-input syntax and put the values in the meanstats teradataml DataFrame. Types: float OR list of floats st_dev: Optional Argument. Specifies the global standard deviation values that the function uses to calculate the SAX code for every partition. A stdev value has the data type float and its value must be greater than 0. If Stdev specifies only one value and value_columns specifies multiple columns, then the specified value applies to every value_column. If Stdev specifies multiple values, then it must specify a value for each value_column. The nth stdev value corresponds to the nth value_column. Tip: To specify a different global standard deviation value for each partition, use the multiple-input syntax and put the values in the stdevstats teradataml DataFrame. Types: float OR list of floats window_size: Required if window_type is 'sliding', disallowed otherwise. Specifies the size of the sliding window. The value must be an integer greater than 0. Types: int output_frequency: Optional Argument. Specifies the number of data points that the window slides between successive outputs. The value must be an integer greater than 0. Note: window_type value must be "sliding" and Output value cannot be "characters". If window_type is "sliding" and Output value is "characters", then output_frequency is automatically set to the value of window_size, to ensure that a single character is assigned to each time point. If the number of data points in the time series is not an integer multiple of the window size, then the function ignores the leftover parts. Default Value: 1 Types: int points_persymbol: Optional Argument. Specifies the number of data points to be converted into one SAX symbol. Each value must be an integer greater than 0. Note: window_type value must be "global". Default Value: 1 Types: int symbols_perwindow: Optional Argument. Specifies the number of SAX symbols to be generated for each window. Each value must be an integer greater than 0. The default value is the value of window_size. Note: window_type value must be "sliding". Types: int alphabet_size: Optional Argument. Specifies the number of symbols in the SAX alphabet. The value must be an integer in the range [2, 20]. Default Value: 4 Types: int bitmap_level: Optional Argument. Specifies the number of consecutive symbols to be converted to one symbol on a bitmap. For bitmap level 1, the bitmap contains the symbols "a", "b", "c", and so on; for bitmap level 2, the bitmap contains the symbols "aa", "ab", "ac", and so on. The input value must be an integer in the range [1, 4]. Note: Output value must be "bitmap". Default Value: 2 Types: int print_stats: Optional Argument. Specifies whether the function prints the mean and standard deviation. Note: Output value must be "string". Default Value: False Types: bool accumulate: Optional Argument. The names of the input teradataml DataFrame columns that are to appear in the output teradataml DataFrame. For each sequence in the input teradataml DataFrame, SAX choose the value corresponding to the first time point in the sequence to output as the accumulate value. Types: str OR list of Strings (str) data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) meanstats_data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "meanstats_data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) stdevstats_data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "stdevstats_data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) RETURNS: Instance of SAX. Output teradataml DataFrames can be accessed using attribute references, such as SAXObj.<attribute_name>. Output teradataml DataFrame attribute name is: result RAISES: TeradataMlException EXAMPLES: # Load example data. load_example_data("sax", "finance_data3") # Create teradataml DataFrame objects finance_data3 = DataFrame.from_table("finance_data3") # Example 1 - This example uses window_type as global and default output value. SAX_Out = SAX(data = finance_data3, data_partition_column = ["id"], data_order_column = ["period"], value_columns = ["expenditure","income","investment"], time_column = "period", window_type = "global", print_stats = True, accumulate = ["id"] ) # Print the results print(SAX_Out) # Example 2 - This example uses window_type as sliding and default output value. # window_size should also be specified when window_type is set as sliding. SAX_Out2 = SAX(data = finance_data3, data_partition_column = ["id"], data_order_column = ["period"], value_columns = ["expenditure"], time_column = "period", window_type = "sliding", window_size = 20, print_stats = True, accumulate = ["id"] ) # Print the results print(SAX_Out2) # Example 3 - This example uses the multiple-input version, where the # mean and standard deviation statistics are applied globally with # meanstats and the stdevstats tables. meanstats = DataFrame.from_table("finance_data3").groupby("id").mean() meanstats = meanstats.assign(drop_columns=True, id=meanstats.id, expenditure=meanstats.mean_expenditure, income=meanstats.mean_income, investment=meanstats.mean_investment) stdevstats = DataFrame.from_table("finance_data3").groupby("id").std() stdevstats = stdevstats.assign(drop_columns=True, id=stdevstats.id, expenditure=stdevstats.std_expenditure, income=stdevstats.std_income, investment=stdevstats.std_investment) SAX_Out3 = SAX(data = finance_data3, data_partition_column = ["id"], data_order_column = ["id"], meanstats_data = meanstats, meanstats_data_partition_column = ["id"], stdevstats_data = stdevstats, stdevstats_data_partition_column = ["id"], value_columns = ["expenditure","income","investment"], time_column = "period", window_type = "global", accumulate = ["id"] ) # Print the results print(SAX_Out3)

__repr__(self): Returns the string representation for a SAX class instance.

get_build_time(self): Function to return the build time of the algorithm in seconds. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

get_prediction_type(self): Function to return the Prediction type of the algorithm. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

get_target_column(self): Function to return the Target Column of the algorithm. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

show_query(self): Function to return the underlying SQL query. When model object is created using retrieve_model(), then None is returned.