Interpolator Syntax Elements - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.10
1.1
Published
October 2019
Language
English (United States)
Last Update
2019-12-31
dita:mapPath
ima1540829771750.ditamap
dita:ditavalPath
jsj1481748799576.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantage™
TimeColumn
Specify the name of the InputTable column that contains the time points of the time series whose missing values are to be calculated.
TimeInterval
[Optional] Specify exactly one of TimeIntervalTable or TimeInterval.

Specify the length of time, in seconds, between calculated values. This value must be either INTEGER or DOUBLE PRECISION.

The choice of TimeIntervalTable or TimeInterval determines which values in the input time series are interpolated:

Choice Values Interpolated
TimeIntervalTable Function calculates an interpolated value for each time point.
TimeInterval Function calculates an interpolated value for a time point only if the value is missing in the original time series; otherwise, the function copies the original value.
If you specify AggregationType, the function ignores TimeIntervalTable or TimeInterval and calculates the aggregated value for each point in the time series.
TargetColumns
Specify the names of InputTable columns to interpolate to the output table.
TimeDataType
[Optional] Specify the data type of the output column that corresponds to the input table column that TimeColumn specifies (time_column).

If you omit this syntax element, the function infers the data type of time_column from the input table and uses the inferred data type for the corresponding output table column.

If you specify this syntax element, the function can transform the input data to the specified output data type only if both the input column data type and the specified output column data type are in this list:
  • INTEGER
  • BIGINT
  • SMALLINT
  • DOUBLE PRECISION
  • NUMERIC(n[,m])
ValueDataType
[Optional] Specify the data types of the output columns that correspond to the input table columns that TargetColumns specifies.

If you omit this syntax element, the function infers the data type of each time_column from the input table and uses the inferred data type for the corresponding output table column.

If you specify ValueDataType, it must be the same size as TargetColumns. That is, if TargetColumns specifies n columns, ValueDataType must specify n data types. For i in [1, n], target_column_i has value_type_i. However, value_type_i can be empty; for example:

TargetColumns ('c1', 'c2', 'c3')
ValueDataType (INTEGER, ,VARCHAR)
If you specify this syntax element, the function can transform the input data to the specified output data type only if both the input column data type and the specified output column data type are in this list:
  • INTEGER
  • BIGINT
  • SMALLINT
  • DOUBLE PRECISION
  • NUMERIC(n[,m])
InterpolationType
[Optional] Specify interpolation types for the columns that TargetColumns specifies.
Must be the same size as TargetColumns. That is, if TargetColumns specifies n columns, InterpolationType must specify n interpolation types. For i in [1, n], target_column_i has interpolation_type_i. However, interpolation_type_i can be empty; for example:
TargetColumns ('c1', 'c2', 'c3')
InterpolationType ('linear', ,'constant')
An empty interpolation_type has the default value.
The function calculates the value for each missing time point using a low-degree polynomial based on a set of nearest neighbors. The following table shows the options for the polynomial.
The possible values of interpolation_type are as follows:
In interpolation_type syntax, brackets do not indicate optional elements—you must include them.
interpolation_type Description
'linear' (Default) Function determines value for each missing time point using linear interpolation between two nearest time points.
'constant' Function sets value for each missing time point to nearest value.

You must use this option if target_column has data type CHARACTER, CHARACTER(n), or VARCHAR.

'spline[(type(cubic))]' Function determines value for each missing time point by fitting a cubic spline to nearest three time points.
'median[(window( n ))]' Function sets value for each missing time point to median value of nearest n time points.

n must be greater than or equal to 2. Default: 5.

'loess[(weights({constant|tricube}), degree ({0|1|2}), span(m))]'
Option Description
weights
  • constant:

    All time points are equally weighted.

  • tricube:

    Time points closer to missing data point are more heavily weighted than those farther away.

Default: constant
degree Degree of polynomial.

Default: 1

m Two choices:
  • Specify number of neighboring time points to use in calculation—m > 1.
  • Specify proportion of time points to use in each fit.

    You must provide RowCounts table, and m must be between (λ+1)/n and 1, where λ is specified degree and n is number of rows in partition).

Default: 5

Specify only one of InterpolationType or AggregationType. If you omit both syntax elements, the function uses InterpolationType with its default value, 'linear'. For data types CHARACTER, CHARACTER(n), and VARCHAR, you cannot use AggregationType. You must use InterpolationType, and interpolation_type must be 'constant'.
AggregationType
[Optional] Specify the aggregation types of the columns that TargetColumns specifies.
Must be the same size as TargetColumns. That is, if TargetColumns specifies n columns, AggregationType must specify n aggregation types. For i in [1, n], target_column_i has aggregation_type_i. However, aggregation_type_i can be empty; for example:
TargetColumns ('c1', 'c2', 'c3')
AggregationType (min, ,max)
An empty aggregation_type has the default value.
This is the syntax of aggregation_type:
{ min | max | mean | mode | sum } [(window(n))]
In aggregation_type syntax, brackets do not indicate optional elements—you must include them.
The function calculates the aggregate value as the minimum, maximum, mean, mode, or sum within a sliding window of length n. n must be greater than or equal to 2.
Default: n is 5 and the default aggregation method is min.
The Interpolator function can calculate the aggregates of values of these data types:
  • INTEGER
  • BIGINT
  • SMALLINT
  • DOUBLE PRECISION
  • NUMERIC
  • NUMERIC(n[,m])
Specify only one of AggregationType or InterpolationType. If you omit both syntax elements, the function uses InterpolationType with its default value, 'linear'.

Aggregation calculations ignore the values in TimeInterval or in the TimeIntervalTable. The function calculates the aggregated value for each value in the time series.

StartTime
[Optional] Specify the start time for the time series.
Default: Start time of the time series in InputTable
EndTime
[Optional] Specify the end time for the time series.
Default: End time of the time series in InputTable
ValuesBeforeFirst
[Optional] Specify the values to use if start_time is before the start time of the time series in InputTable. Each of these values must have the same data type as its corresponding target_column. Values of data type VARCHAR are case-insensitive.
If TargetColumns specifies n columns, ValuesBeforeFirst must specify n values. For i in [1, n], target_column_i has the value before_first_value_i. However, before_first_value_i can be empty; for example:
TargetColumns ('c1', 'c2', 'c3')
ValuesBeforeFirst (1, ,'abc')
If before_first_value_i is empty, target_column_i has the value NULL.
Default: target_column_i has the value NULL for i in [1, n].
ValuesAfterLast
[Optional] Specify the values to use if end_time is after the end time of the time series in InputTable. Each of these values must have the same data type as its corresponding target_column. Values of data type VARCHAR are case-insensitive.
If TargetColumns specifies n columns, ValuesAfterLast must specify n values. For i in [1, n], target_column_i has the value after_last_value_i. However, after_last_value_i can be empty; for example:
TargetColumns ('c1', 'c2', 'c3')
ValuesAfterLast (1, ,'abc')
If after_last_value_i is empty, target_column_i has the value NULL.
Default: target_column_i has the value NULL for i in [1, n].
DuplicateRowsCount
[Optional] Specify the number of rows to duplicate across split boundaries.
If you specify only value1, the function duplicates value1 rows from the previous partition and value1 rows from the next partition.
If you specify both value1 and value2, the function duplicates value1 rows from the previous partition and value2 rows from the next partition.
Each value must be nonnegative INTEGER. Both value1 and value2 must exceed the number of time points that the function needs for every specified interpolation or aggregation method. The following table shows how many time points the function needs for each interpolation method. For aggregation, the number of time points required is determined by the value of n in window( n ) specified by aggregation_type.
Interpolation Method Number of Time Points Needed
'linear' 1
'constant' 1
'spline' 2
'median [(window(n))]' n/2
'loess [(weights ({constant|tricube}), degree ({0|1|2}), span(m))]' If m > 1: m-1

If m < 1: (m * n)-1, where n is total number of data rows, found in column n of the RowCounts table

Accumulate
[Optional] Specify the names of InputTable columns (other than those specified by TimeColumn and TargetColumns) to copy to the output table.
Default: Only the columns specified by TimeColumn and TargetColumns