Teradata Package for Python Function Reference | 17.10 - Interpolator - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

teradataml.analytics.mle.Interpolator = class Interpolator(builtins.object)

Methods defined here:

__init__(self, data=None, time_data=None, count_rownumber=None, time_column=None, value_columns=None, time_interval=None, interpolation_type=None, aggregation_type=None, time_datatype=None, value_datatype=None, start_time=None, end_time=None, values_before_first=None, values_after_last=None, duplicate_rows_count=None, accumulate=None, data_sequence_column=None, time_data_sequence_column=None, count_rownumber_sequence_column=None, data_partition_column=None, count_rownumber_partition_column=None, data_order_column=None, time_data_order_column=None, count_rownumber_order_column=None): DESCRIPTION: The Interpolator function calculates missing values in a time series, using either interpolation or aggregation. Interpolation estimates missing values between known values. Aggregation combines known values to produce an aggregate value. PARAMETERS: data: Required Argument. Specifies the teradataml DataFrame that contains the input data. data_partition_column: Required Argument. Specifies Partition By columns for data. Values to this argument can be provided as a list, if multiple columns are used for partition. Types: str OR list of Strings (str) data_order_column: Required Argument. Specifies Order By columns for data. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) time_data: Optional Argument. Specifies the teradataml DataFrame name which contains time. If you specify time_data then the function calculates an interpolated value for each time point. Note: If you omit time_data, you must specify the time_interval argument. time_data_order_column: Optional Argument. Specifies Order By columns for time_data. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) count_rownumber: Optional Argument. Specifies the teradataml DataFrame name which contains proportion of time points. Note: It is only used with interpolation_type. ("loess"(weights ({constant | tricube}), degree ({0 | 1 | 2}), span(m))), where m is between (x+1)/n and 1. count_rownumber_partition_column: Optional Argument. Specifies Partition By columns for count_rownumber. Values to this argument can be provided as a list, if multiple columns are used for partition. Types: str OR list of Strings (str) count_rownumber_order_column: Optional Argument. Specifies Order By columns for count_rownumber. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) time_column: Required Argument. Specifies the name of the input teradataml DataFrame data column that contains the time points of the time series whose missing values are to be calculated. Types: str value_columns: Required Argument. Specifies the names of input teradataml DataFrame data columns to interpolate to the output teradataml DataFrame. Types: str OR list of Strings (str) time_interval: Optional Argument. Required when time_data is not provided. Specifies the length of time, in seconds, between calculated values. If you specify time_interval then the function calculates an interpolated value for a time point only if the value is missing in the original time series; otherwise, the function copies the original value. Note: 1. If you specify aggregation_type, the function ignores time_data or time_interval and calculates the aggregated value for each point in the time series. 2. Specify exactly one of time_data or time_interval. Types: int or float interpolation_type: Optional Argument. Specifies interpolation types for the columns that value_columns specifies. If you specify interpolation_type, then it must be the same size as value_columns. That is, if value_columns specifies n columns, then interpolation_type must specify n interpolation types. For i in [1, n], value_column_i has interpolation_type_i. However, interpolation_type_i can be empty; for example: value_columns (c1, c2, c3) interpolation_type ("linear", ,"constant") An empty interpolation_type has the default value. The function calculates the value for each missing time point using a low-degree polynomial based on a set of nearest neighbors. The possible values of interpolation_type are as follows. * "linear" (default): The value for each missing time point is determined using linear interpolation between the two nearest points. * "constant": The value for each missing time point is set to the nearest value. You must use this option if value_column has SQL data type CHARACTER, CHARACTER(n), or VARCHAR. * "spline[(type(cubic))]": The value for each missing time point is determined by fitting a cubic spline to the nearest three points. * "median[(window(n))]": The value for each missing time point is set to the median value of the nearest n time points. n must be greater than or equal to 2. The default value of n is 5. * "loess[(weights({constant | tricube}), degree ({0 |1 |2}), span(m))]": * weights: * constant: All time points are equally weighted. * ricube: Time points closer to missing data point are more heavily weighted than those farther away. The default value is constant. * degree: Degree of polynomial. The default value is 1. * m: Two choices: * It is either an integer greater than 1 (which specifies the number of neighboring points) * Specifies proportion of time points to use in each fit. You must provide count_rownumber, and m must be between (x+1)/n and 1, where x is specified degree and n is number of rows in partition). The default value of m is 5. Note: 1. Specify only one of interpolation_type or aggregation_type. 2. If you omit both syntax elements, the function uses interpolation_type with its default value, 'linear'. 3. For SQL data types CHARACTER, CHARACTER(n), and VARCHAR, you cannot use aggregation_type. You must use interpolation_type, and interpolation_type must be 'constant'. 4. In interpolation_type syntax, brackets do not indicate optional elements - you must include them. Types: str OR list of strs aggregation_type: Optional Argument. Specifies the aggregation types of the columns that value_columns specifies. If you specify aggregation_type, then it must be the same size as value_columns. That is, if value_columns specifies n columns, then aggregation_type must specify n aggregation types. For i in [1, n], value_column_i has aggregation_type_i. However, aggregation_type_i can be empty. for example: value_columns (c1, c2, c3) aggregation_type (min, ,max) An empty aggregation_type has the default value. The syntax of aggregation_type is: { min | max | mean | mode | sum } [(window(n))] The function calculates the aggregate value as the minimum, maximum, mean, mode, or sum within a sliding window of length n. n must be greater than or equal to 2. The default value of n is 5. The default aggregation method is min. The Interpolator function can calculate the aggregates of values of these SQL data types: * int * BIGINT * SMALLINT * float * DECIMAL(n,n) * DECIMAL * NUMERIC * NUMERIC(n,n) Note: 1. Specify only one of aggregation_type or interpolation_type. 2. If you omit both syntax elements, the function uses interpolation_type with its default value, 'linear'. 3. Aggregation calculations ignore the values in time_interval or in the time_data. The function calculates the aggregated value for each value in the time series. 4. In aggregation_type syntax, brackets do not indicate optional elements - you must include them. Types: str OR list of strs time_datatype: Optional Argument. Specifies the data type of the output column that corresponds to the input teradataml DataFrame data column that time_column specifies (time_column). If you omit this argument, then the function infers the data type of time_column from the input teradataml DataFrame data and uses the inferred data type for the corresponding output teradataml DataFrame column. If you specify this argument, then the function can transform the input data to the specified output data type only if both the input column data type and the specified output column data type are in this list: * int * BIGINT * SMALLINT * float * DECIMAL(n,n) * DECIMAL * NUMERIC * NUMERIC(n,n) Types: str value_datatype: Optional Argument. Specifies the data types of the output columns that correspond to the input teradataml DataFrame data columns that value_columns specifies. If you omit this argument, then the function infers the data type of each time_column from the input teradataml DataFrame data and uses the inferred data type for the corresponding output teradataml DataFrame column. If you specify value_datatype, then it must be the same size as value_columns. That is, if value_columns specifies n columns, then value_datatype must specify n data types. For i in [1, n], value_column_i has value_type_i. However, value_type_i can be empty; for example: value_columns (c1, c2, c3) value_datatype (int, ,VARCHAR) If you specify this argument, then the function can transform the input data to the specified output data type only if both the input column data type and the specified output column data type are in this list: * int * BIGINT * SMALLINT * float * DECIMAL(n,n) * DECIMAL * NUMERIC * NUMERIC(n,n) Types: str OR list of strs start_time: Optional Argument. Specifies the start time for the time series. The default value is the start time of the time series in input teradataml DataFrame. Types: str end_time: Optional Argument. Specifies the end time for the time series. The default value is the end time of the time series in input teradataml DataFrame. Types: str values_before_first: Optional Argument. Specifies the values to use if start_time is before the start time of the time series in input teradataml DataFrame. Each of these values must have the same data type as its corresponding value_column. Values of data type VARCHAR are case-insensitive. If value_columns specifies n columns, then values_before_first must specify n values. For in [1, n], value_column_i has the value before_first_value_i. However, before_first_value_i can be empty; for example: value_columns (c1, c2, c3) values_before_first (1, ,"abc") If before_first_value_i is empty, then value_column_i has the value NULL. If you do not specify values_before_first, then value_column_i has the value NULL for i in [1, n]. Types: str OR list of strs values_after_last: Optional Argument. Specifies the values to use if end_time is after the end time of the time series in input teradataml DataFrame. Each of these values must have the same data type as its corresponding value_column. Values of data type VARCHAR are case-insensitive. If value_columns specifies n columns, then values_after_last must specify n values. For i in [1, n], value_column_i has the value after_last_value_i. However, after_last_value_i can be empty; for example: value_columns (c1, c2, c3) values_after_last (1, ,"abc") If after_last_value_i is empty, then value_column_i has the value NULL. If you do not specify values_after_last, then value_column_i has the value NULL for i in [1, n]. Types: str OR list of strs duplicate_rows_count: Optional Argument. Specifies the number of rows to duplicate across split boundaries if you use the SeriesSplitter function. If you specify only value1, then the function duplicates value1 rows from the previous partition and value1 rows from the next partition. If you specify both value1 and value2, then the function duplicates value1 rows from the previous partition and value2 rows from the next partition. Each argument value must be non-negative int. Both value1 and value2 must exceed the number of time points that the function needs for every specified interpolation or aggregation method. For aggregation, the number of time points required is determined by the value of n in window(n) specified by aggregation_type. The interpolation methods and the number of time points that the function needs for them are: * "linear": 1 * "constant": 1 * "spline": 2 * "median [(window(n))]": n/2 * "loess [(weights ({constant | tricube}), degree ({0 | 1 | 2}), span(m))]": * m > 1: m-1 * m < 1: (m * n)-1 where n is total number of data rows, found in column n of the count_rownumber DataFrame. Types: int OR list of ints accumulate: Optional Argument. Specifies the names of input teradataml DataFrame columns (other than those specified by time_column and value_columns) to copy to the output table. By default, the function copies to the output teradataml DataFrame only the columns specified by time_column and value_columns. Types: str OR list of Strings (str) data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) time_data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "time_data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) count_rownumber_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "count_rownumber". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) RETURNS: Instance of Interpolator. Output teradataml DataFrames can be accessed using attribute references, such as InterpolatorObj.<attribute_name>. Output teradataml DataFrame attribute name is: result RAISES: TeradataMlException EXAMPLES: # Load the data to run the example. load_example_data("Interpolator", ["ibm_stock1", "time_table1"]) # Create teradataml DataFrame. ibm_stock1 = DataFrame.from_table("ibm_stock1") time_table1 = DataFrame.from_table("time_table1") # Example 1 : Running Interpolator function with aggregation_type min. interpolator_out1 = Interpolator(data=ibm_stock1, data_partition_column='id', data_order_column='period', time_data=time_table1, time_data_order_column='period', time_column='period', value_columns='stockprice', accumulate='id', aggregation_type='min[(window(2))]', values_before_first='0', values_after_last='0', data_sequence_column='period' ) # Print the result DataFrame. print(interpolator_out1.result) # Example 2 : Running Interpolator function with constant interpolation. interpolator_out2 = Interpolator(data=ibm_stock1, data_partition_column='id', data_order_column='period', time_column='period', value_columns='stockprice', accumulate='id', time_interval=86400.0, interpolation_type='constant', values_before_first='0', values_after_last='0' ) # Print the result DataFrame. print(interpolator_out2.result) # Example 3 : Running Interpolator function with linear interpolation. interpolator_out3 = Interpolator(data=ibm_stock1, data_partition_column='id', data_order_column='period', time_column='period', value_columns='stockprice', accumulate='id', time_interval=86400.0, interpolation_type='linear', values_before_first='0', values_after_last='0' ) # Print the result DataFrame. print(interpolator_out3.result) # Example 4 : Running Interpolator function with median interpolation. interpolator_out4 = Interpolator(data=ibm_stock1, data_partition_column='id', data_order_column='period', time_column='period', value_columns='stockprice', accumulate='id', time_interval=86400.0, interpolation_type='median[(window(4))]', values_before_first='0', values_after_last='0' ) # Print the result DataFrame. print(interpolator_out4.result) # Example 5 : Running Interpolator function with spline interpolation. interpolator_out5 = Interpolator(data=ibm_stock1, data_partition_column='id', data_order_column='period', time_column='period', value_columns='stockprice', accumulate='id', time_interval=86400.0, interpolation_type='spline[(type(cubic))]', values_before_first='0', values_after_last='0' ) # Print the result DataFrame. print(interpolator_out5.result) # Example 6 : Running Interpolator function with loess interpolation. interpolator_out6 = Interpolator(data=ibm_stock1, data_partition_column='id', data_order_column='period', time_column='period', value_columns='stockprice', accumulate='id', time_interval=86400.0, interpolation_type='loess[(weights(constant),degree(2),span(4))]', values_before_first='0', values_after_last='0' ) # Print the result DataFrame. print(interpolator_out6)

__repr__(self): Returns the string representation for a Interpolator class instance.

get_build_time(self): Function to return the build time of the algorithm in seconds. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

get_prediction_type(self): Function to return the Prediction type of the algorithm. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

get_target_column(self): Function to return the Target Column of the algorithm. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

show_query(self): Function to return the underlying SQL query. When model object is created using retrieve_model(), then None is returned.