7.00.02 - OutlierFilter Arguments - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Release Date
September 2017
Content Type
Programming Reference
User Guide
Publication ID
B700-1022-700K
Language
English (United States)
InputTable
Specifies the name of the table that contains the numeric data to be filtered and (optionally) the columns by which to group the data.
OutputTable
Specifies the name of the table where the function stores the copy of the input table (including the PARTITION BY column) with the outliers either deleted (by default) or replaced (as specified by the ReplacementValue argument).
TargetColumn
Specifies the names of the input table columns to be filtered.
OutlierTable
[Optional] Specifies the name of the table where the function outputs copies of the rows of the input table that contain outliers. Default behavior: Function does not output an outlier table.
GroupByColumns
[Optional] Specifies the names of the input table columns by which to group the data. If the data schema format is name:value, this list must include name. Default behavior: Function does not group data.
Method
[Optional] Specifies the method or methods of filtering outliers:
  • 'percentile' (Default)
  • 'tukey'

    Tukey's test: An outlier is defined as any observation smaller than V1 - k*(V3-V1) or larger than V3 + k*(V3-V1), where V1 and V3 are the 25th and 75th percentiles of the data and k is specified by the argument IQRMultiplier.

  • 'carling'

    Carling's modification to Tukey's test: An outlier is defined as an observation outside the range V2 ± c*(V3 - V1), where V2 is the median of the data, V1 and V3 are the 25th and 75th percentiles of the data, and c is a constant that you cannot change.

    For more information about Carling's modification, see:

    Carling, Kenneth. "Resistant outlier rules and the non-Gaussian case." Computational Statistics & Data Analysis 33, no. 3 (2000): 249-258. Available at https://core.ac.uk/download/pdf/6559387.pdf.

  • 'MAD-median'

    Median absolute deviation (MAD), the median of the absolute values of the residuals. For example, if there are i datapoints and the median value of the data is M, MAD=mediani(|xi-M|).

Specify either one method, which the function uses for all columns specified by TargetColumn, or specify a method for each column specified by TargetColumn.

ApproxPercentile
[Optional] Specifies whether the function calculates the percentiles used as filter limits exactly. Default: 'false'.

Approximate percentiles are typically faster, but can fail when the number of groups exceeds one million.

PercentileThreshold
[Optional] Specifies the range of percentile values for 'percentile' filtering, [perc_lower, 100 -perc_lower]. Default: [5, 95].
PercentileAccuracy
[Optional] Specifies the accuracy of percentiles used for filtering. Default: 0.5%.
IQRMultiplier
[Optional] Specifies the multiplier of interquartile range for 'tukey' filtering. Default: 1.5.
RemoveTail
[Optional] Specifies the side of the distribution to filter. Default: 'both'.
ReplacementValue
[Optional] Specifies how the function handles outliers:
  • 'delete' (Default)

    The function does not copy the row to the output table.

  • 'null'

    The function copies the row to the output table, replacing each outlier with the value NULL.

  • 'median'

    The function copies the row to the output table, replacing each outlier with the median value for its group.

  • newval

    The function copies the row to the output table, replacing each outlier with newval, which must be a numeric value.

MadScaleConstant
[Optional] Specifies the scale constant used with 'MAD-median' filtering; a DOUBLE PRECISION value. Default: 1.4826, which means MAD = 1.4826 * median(|x - median(x)|).
MadThreshold
[Optional] Specifies the threshold used with 'MAD-median' filtering; a DOUBLE PRECISION value. Default: 3, which means that |x-median(x)|/MAD > 3 is flagged as an outlier.