| |
Methods defined here:
- __init__(self, data=None, meanstats_data=None, stdevstats_data=None, value_columns=None, time_column=None, window_type='global', output='string', mean=None, st_dev=None, window_size=None, output_frequency=1, points_persymbol=1, symbols_perwindow=None, alphabet_size=4, bitmap_level=2, print_stats=False, accumulate=None, data_sequence_column=None, meanstats_data_sequence_column=None, stdevstats_data_sequence_column=None, data_partition_column=None, meanstats_data_partition_column=None, stdevstats_data_partition_column=None, data_order_column=None, meanstats_data_order_column=None, stdevstats_data_order_column=None)
- DESCRIPTION:
The SAX (Symbolic Aggregate approXimation) function transforms a time
series data item into a smaller sequence of symbols, which are more
suitable for additional types of manipulation, because of their smaller
size and the relative ease with which patterns can be identified and
compared. Input and output formats allow it to be analyzed using NPath
or Shapelet Functions, or by other hashing or regular-expression pattern
matching algorithms.
PARAMETERS:
data:
Required Argument.
Specifies the teradataml DataFrame containing timeseries data.
data_partition_column:
Required Argument.
Specifies Partition By columns for data.
Values to this argument can be provided as list, if multiple columns
are used for partition.
Types: str OR list of Strings (str)
data_order_column:
Required Argument.
Specifies Order By columns for data.
Values to this argument can be provided as list, if multiple columns
are used for ordering.
Types: str OR list of Strings (str)
meanstats_data:
Optional Argument.
Specifies teradataml DataFrame that contains the global means of each
value_column of the input teradataml DataFrame.
meanstats_data_partition_column:
Optional Argument. Required if 'meanstats_data' is used.
Specifies Partition By columns for meanstats_data.
Values to this argument can be provided as list, if multiple columns
are used for partition.
Types: str OR list of Strings (str)
meanstats_data_order_column:
Optional Argument.
Specifies Order By columns for meanstats_data.
Values to this argument can be provided as list, if multiple columns
are used for ordering.
Types: str OR list of Strings (str)
stdevstats_data:
Optional Argument.
Specifies teradataml DataFrame that contains the global standard deviations
of each value_column of the input teradataml DataFrame.
stdevstats_data_partition_column:
Optional Argument. Required if 'stdevstats_data' is used.
Specifies Partition By columns for stdevstats_data.
Values to this argument can be provided as list, if multiple columns
are used for partition.
Types: str OR list of Strings (str)
stdevstats_data_order_column:
Optional Argument.
Specifies Order By columns for stdevstats_data.
Values to this argument can be provided as list, if multiple columns
are used for ordering.
Types: str OR list of Strings (str)
value_columns:
Required Argument.
Specifies the names of the input teradataml DataFrame columns that
contain the time series data to be transformed.
Types: str OR list of Strings (str)
time_column:
Optional Argument.
Specifies the name of the input teradataml DataFrame column that
contains the time axis of the data.
Types: str
window_type:
Optional Argument.
Determines how much data the function processes at one time:
"global": The function computes the SAX code using a single
mean and standard deviation for the entire data set.
"sliding": The function recomputes the mean and standard
deviation for a sliding window of the data set.
Default Value: "global"
Permitted Values: sliding, global
Types: str
output:
Optional Argument.
Determines how the function outputs the results:
"string": The function outputs a list of SAX codes for each window.
"bytes": The function outputs the list of SAX codes as compact
byte arrays (which are not "human-readable").
"bitmap": The function outputs a JSON representation of a SAX bitmap.
"characters": The function outputs one character for each line.
Default Value: "string"
Permitted Values: STRING, BITMAP, BYTES, CHARACTERS
Types: str
mean:
Optional Argument.
Specifies the global mean values that the function uses to calculate
the SAX code for every partition. A mean value has the data type
float. If mean specifies only one value and value_columns specifies
multiple columns, then the specified value applies to every
value_column. If mean specifies multiple values, then it must specify
a value for each value_column. The nth mean value corresponds to the
nth value_column.
Tip: To specify a different global mean value for each partition,
use the multiple-input syntax and put the values in the meanstats
teradataml DataFrame.
Types: float OR list of floats
st_dev:
Optional Argument.
Specifies the global standard deviation values that the function uses
to calculate the SAX code for every partition. A stdev value has the
data type float and its value must be greater than 0. If Stdev
specifies only one value and value_columns specifies multiple
columns, then the specified value applies to every value_column. If
Stdev specifies multiple values, then it must specify a value for
each value_column. The nth stdev value corresponds to the nth
value_column.
Tip: To specify a different global standard deviation value for each
partition, use the multiple-input syntax and put the values in the
stdevstats teradataml DataFrame.
Types: float OR list of floats
window_size:
Required if window_type is 'sliding', disallowed otherwise.
Specifies the size of the sliding window. The value must be an
integer greater than 0.
Types: int
output_frequency:
Optional Argument.
Specifies the number of data points that the window slides between
successive outputs. The value must be an integer greater than 0.
Note: window_type value must be "sliding" and Output value cannot be
"characters". If window_type is "sliding" and Output value is
"characters", then output_frequency is automatically set to the value
of window_size, to ensure that a single character is assigned to each
time point. If the number of data points in the time series is not an
integer multiple of the window size, then the function ignores the
leftover parts.
Default Value: 1
Types: int
points_persymbol:
Optional Argument.
Specifies the number of data points to be converted into one SAX
symbol. Each value must be an integer greater than 0.
Note: window_type value must be "global".
Default Value: 1
Types: int
symbols_perwindow:
Optional Argument.
Specifies the number of SAX symbols to be generated for each window.
Each value must be an integer greater than 0. The default value is
the value of window_size.
Note: window_type value must be "sliding".
Types: int
alphabet_size:
Optional Argument.
Specifies the number of symbols in the SAX alphabet. The value must
be an integer in the range [2, 20].
Default Value: 4
Types: int
bitmap_level:
Optional Argument.
Specifies the number of consecutive symbols to be converted to one
symbol on a bitmap. For bitmap level 1, the bitmap contains the
symbols "a", "b", "c", and so on; for bitmap level 2, the bitmap
contains the symbols "aa", "ab", "ac", and so on. The input value
must be an integer in the range [1, 4].
Note: Output value must be "bitmap".
Default Value: 2
Types: int
print_stats:
Optional Argument.
Specifies whether the function prints the mean and standard
deviation.
Note: Output value must be "string".
Default Value: False
Types: bool
accumulate:
Optional Argument.
The names of the input teradataml DataFrame columns that are to
appear in the output teradataml DataFrame. For each sequence in the
input teradataml DataFrame, SAX choose the value corresponding to
the first time point in the sequence to output as the accumulate value.
Types: str OR list of Strings (str)
data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
meanstats_data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "meanstats_data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
stdevstats_data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "stdevstats_data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
RETURNS:
Instance of SAX.
Output teradataml DataFrames can be accessed using attribute
references, such as SAXObj.<attribute_name>.
Output teradataml DataFrame attribute name is:
result
RAISES:
TeradataMlException
EXAMPLES:
# Load example data.
load_example_data("sax", "finance_data3")
# Create teradataml DataFrame objects
finance_data3 = DataFrame.from_table("finance_data3")
# Example 1 - This example uses window_type as global and default output value.
SAX_Out = SAX(data = finance_data3,
data_partition_column = ["id"],
data_order_column = ["period"],
value_columns = ["expenditure","income","investment"],
time_column = "period",
window_type = "global",
print_stats = True,
accumulate = ["id"]
)
# Print the results
print(SAX_Out)
# Example 2 - This example uses window_type as sliding and default output value.
# window_size should also be specified when window_type is set as sliding.
SAX_Out2 = SAX(data = finance_data3,
data_partition_column = ["id"],
data_order_column = ["period"],
value_columns = ["expenditure"],
time_column = "period",
window_type = "sliding",
window_size = 20,
print_stats = True,
accumulate = ["id"]
)
# Print the results
print(SAX_Out2)
# Example 3 - This example uses the multiple-input version, where the
# mean and standard deviation statistics are applied globally with
# meanstats and the stdevstats tables.
meanstats = DataFrame.from_table("finance_data3").groupby("id").mean()
meanstats = meanstats.assign(drop_columns=True, id=meanstats.id, expenditure=meanstats.mean_expenditure,
income=meanstats.mean_income, investment=meanstats.mean_investment)
stdevstats = DataFrame.from_table("finance_data3").groupby("id").std()
stdevstats = stdevstats.assign(drop_columns=True, id=stdevstats.id, expenditure=stdevstats.std_expenditure,
income=stdevstats.std_income, investment=stdevstats.std_investment)
SAX_Out3 = SAX(data = finance_data3,
data_partition_column = ["id"],
data_order_column = ["id"],
meanstats_data = meanstats,
meanstats_data_partition_column = ["id"],
stdevstats_data = stdevstats,
stdevstats_data_partition_column = ["id"],
value_columns = ["expenditure","income","investment"],
time_column = "period",
window_type = "global",
accumulate = ["id"]
)
# Print the results
print(SAX_Out3)
- __repr__(self)
- Returns the string representation for a SAX class instance.
- get_build_time(self)
- Function to return the build time of the algorithm in seconds.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_prediction_type(self)
- Function to return the Prediction type of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_target_column(self)
- Function to return the Target Column of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- show_query(self)
- Function to return the underlying SQL query.
When model object is created using retrieve_model(), then None is returned.
|