Arguments - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product
Aster Analytics
Release Number
6.21
Published
November 2016
Language
English (United States)
Last Update
2018-04-14
dita:mapPath
kiu1466024880662.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1021
lifecycle
previous
Product Category
Software
Argument Category Description
InputTable Required Specifies the name of the table that contains the numeric data to be filtered and (optionally) the columns by which to group the data.
OutputTable Required Specifies the name of the table where the function stores the copy of the input table (including the PARTITION BY column) with the outliers either deleted (by default) or replaced (as specified by the ReplacementValue argument).
TargetColumn Required Specifies the names of the input table columns to be filtered.
OutlierTable Optional Specifies the name of the table where the function outputs copies of the rows of the input table that contain outliers.
GroupByColumns Optional Specifies the names of the input table columns by which to group the data. If the data schema format is name:value, then this list must include name.
Method Optional Specifies the method or methods of filtering outliers:
  • 'percentile' (default)
  • 'tukey' (Tukey’s method)
  • 'carling' (Carling’s modification)
  • 'MAD-median' (Median absolute deviation (MAD))

    MAD is defined as the median of the absolute values of the residuals. For example, if there are i datapoints and the median value of the data is M, then MAD=mediani(|xi-M|).

Specify either one method, which the function uses for all columns specified by TargetColumn, or specify a method for each column specified by TargetColumn.

ApproxPercentile Optional Specifies whether the function calculates the percentiles used as filter limits exactly. The default value is 'false'.

Approximate percentiles are typically faster, but might fail when the number of groups exceeds one million.

PercentileThreshold Optional Specifies the range of percentile values for 'percentile' filtering, [perc_lower, 100 -perc_lower]. The default filter range is [5, 95].
PercentileAccuracy Optional Specifies the accuracy of percentiles used for filtering. The default value is 0.5%.
IQRMultiplier Optional Specifies the multiplier of interquartile range for 'tukey' filtering. The default value is 1.5.
RemoveTail Optional Specifies the side of the distribution to filter. The default value is 'both'.
ReplacementValue Optional Specifies how the function handles outliers:
  • 'delete' (default)

    The function does not copy the row to the output table.

  • 'null'

    The function copies the row to the output table, replacing each outlier with the value NULL.

  • 'median'

    The function copies the row to the output table, replacing each outlier with the median value for its group.

  • newval

    The function copies the row to the output table, replacing each outlier with newval, which must be a numeric value.

MadScaleConstant Optional Specifies the scale constant used with 'MAD-median' filtering; a DOUBLE PRECISION value. The default value is 1.4826, which means MAD = 1.4826 * median(|x - median(x)|).
MadThreshold Optional Specifies the threshold used with 'MAD-median' filtering; a DOUBLE PRECISION value. The default value is 3, which means that |x-median(x)|/MAD > 3 is flagged as an outlier.