Interpolator Arguments - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantage™
TimeColumn
Specify the name of the input_table column that contains the time points of the time series whose missing values are to be calculated.
TimeInterval
[Optional] Specify exactly one of time_table or TimeInterval.

Specify the length of time, in seconds, between calculated values. This value must be either INTEGER or DOUBLE PRECISION.

ValueColumns
Specify the names of input_table columns to interpolate to the output table.
TimeDataType
[Optional] Specify the data type of the output column that corresponds to the input table column that TimeColumn specifies (time_column).

If you omit this argument, the function infers the data type of time_column from the input table and uses the inferred data type for the corresponding output table column.

If you specify this argument, the function can transform the input data to the specified output data type only if both the input column data type and the specified output column data type are in this list:
  • INTEGER
  • BIGINT
  • SMALLINT
  • DOUBLE PRECISION
  • NUMERIC(n[,m])
ValueDataType
[Optional] Specify the data types of the output columns that correspond to the input table columns that ValueColumns specifies.

If you omit this argument, the function infers the data type of each time_column from the input table and uses the inferred data type for the corresponding output table column.

If you specify ValueDataType, it must be the same size as ValueColumns. That is, if ValueColumns specifies n columns, ValueDataType must specify n data types. For i in [1, n], value_column_i has value_type_i. However, value_type_i can be empty; for example:

ValueColumns ('c1', 'c2', 'c3')
ValueDataType (INTEGER, ,VARCHAR)
If you specify this argument, the function can transform the input data to the specified output data type only if both the input column data type and the specified output column data type are in this list:
  • INTEGER
  • BIGINT
  • SMALLINT
  • DOUBLE PRECISION
  • NUMERIC(n[,m])
InterpolationType
[Optional] Specify interpolation types for the columns that ValueColumns specifies.
Must be the same size as ValueColumns. That is, if ValueColumns specifies n columns, InterpolationType must specify n interpolation types. For i in [1, n], value_column_i has interpolation_type_i. However, interpolation_type_i can be empty; for example:
ValueColumns ('c1', 'c2', 'c3')
InterpolationType ('linear', ,'constant')
An empty interpolation_type has the default value.
The possible values of interpolation_type are as follows:
In interpolation_type syntax, brackets do not indicate optional elements—you must include them.
interpolation_type Description
'linear' (Default) Function determines value for each missing data point using linear interpolation between two nearest points.
'constant' Function sets value for each missing data point to nearest value.
You must use this option if value_column has data type CHARACTER, CHARACTER(n), or VARCHAR.
'spline[(type(cubic))]]' Function determines value for each missing data point by fitting a cubic spline to nearest three points.
'median[(window( n ))]' Function sets value for each missing data point to median value of nearest four points.

n must be greater than or equal to 2. Default: 5.

'loess[(weights({constant|tricube}), degree ({0|1|2}), span(m))]'
Option Description
weights Default: constant
degree Default: 1
m Either an integer greater than 1 (which specifies number of neighboring points) or a real number between (λ+1)/n and 1 (λ is degree of local polynomial and n is number of data points).

Default: 5

The value for each missing data point is a low-degree polynomial based on a set of nearest neighbors. The fitting can be weighted so that points closer to the missing data have more influence than points farther away.
Your choice of TimeInterval or time_table affects interpolation:
TimeInterval Function calculates value for time point only if value is missing; otherwise, function copies original value.
time_table Function always calculates value of time point.
Specify only one of InterpolationType or AggregationType. If you omit both arguments, the function uses InterpolationType with its default value, 'linear'.
AggregationType
[Optional] Specify the aggregation types of the columns that ValueColumns specifies.
Must be the same size as ValueColumns. That is, if ValueColumns specifies n columns, AggregationType must specify n aggregation types. For i in [1, n], value_column_i has aggregation_type_i. However, aggregation_type_i can be empty; for example:
ValueColumns ('c1', 'c2', 'c3')
AggregationType (min, ,max)
An empty aggregation_type has the default value.
This is the syntax of aggregation_type:
{ min | max | mean | mode | sum } [(window(n))]
In aggregation_type syntax, brackets do not indicate optional elements—you must include them.
The function calculates the aggregate value as the minimum, maximum, mean, mode, or sum within a sliding window of length n. n must be greater than or equal to 2.
Default: n is 5 and the default aggregation method is min.
The Interpolator function can calculate the aggregates of values of these data types:
  • INTEGER
  • BIGINT
  • SMALLINT
  • DOUBLE PRECISION
  • NUMERIC
  • NUMERIC(n[,m])
Your choice of TimeInterval or time_table does not affect aggregation. The function always calculates the aggregated value.
Specify only one of AggregationType or InterpolationType. If you omit both arguments, the function uses InterpolationType with its default value, 'linear'.
StartTime
[Optional] Specify the start time for the time series.
Default: Start time of the time series in input_table
EndTime
[Optional] Specify the end time for the time series.
Default: End time of the time series in input_table
ValuesBeforeFirst
[Optional] Specify the values to use if start_time is before the start time of the time series in input_table. Each of these values must have the same data type as its corresponding value_column. Values of data type VARCHAR are case-insensitive.
If ValueColumns specifies n columns, ValuesBeforeFirst must specify n values. For i in [1, n], value_column_i has the value before_first_value_i. However, before_first_value_i can be empty; for example:
ValueColumns ('c1', 'c2', 'c3')
ValuesBeforeFirst (1, ,'abc')
If before_first_value_i is empty, value_column_i has the value NULL.
Default: value_column_i has the value NULL for i in [1, n].
ValuesAfterLast
[Optional] Specify the values to use if end_time is after the end time of the time series in input_table. Each of these values must have the same data type as its corresponding value_column. Values of data type VARCHAR are case-insensitive.
If ValueColumns specifies n columns, ValuesAfterLast must specify n values. For i in [1, n], value_column_i has the value after_last_value_i. However, after_last_value_i can be empty; for example:
ValueColumns ('c1', 'c2', 'c3')
ValuesAfterLast (1, ,'abc')
If after_last_value_i is empty, value_column_i has the value NULL.
Default: value_column_i has the value NULL for i in [1, n].
DuplicateRowsCount
[Optional] Specify the number of rows to duplicate across split boundaries if you use the function SeriesSplitter (SeriesSplitter Example 2 : Using SeriesSplitter with Interpolator shows how to use Interpolator with SeriesSplitter).

If you specify this argument but do not use SeriesSplitter, or do not conform to the conditions that apply for the value for each interpolation or aggregation type, the function either issues an error message or produces incorrect results.

If you specify only value1, the function duplicates value1 rows from the previous partition and value1 rows from the next partition. If you specify both value1 and value2, the function duplicates value1 rows from the previous partition and value2 rows from the next partition. Each argument value must be nonnegative INTEGER.

Both value1 and value2 must exceed the number of data points that the function needs for every specified interpolation or aggregation method. The following table shows how many data points the function needs for each interpolation method.
Interpolation Method Number of Data Points Needed
'linear' 1
'constant' 1
'spline' 2
'median [(window(n))]' n/2
'loess [(weights ({constant|tricube}), degree ({0|1|2}), span(m))]' If m > 1: m-1

If m < 1: (m * n)-1, where n is total number of data rows, found in column n of the count_row_number table

Accumulate
[Optional] Specify the names of input_table columns (other than those specified by TimeColumn and ValueColumns) to copy to the output table.
Default: Only the columns specified by TimeColumn and ValueColumns