OutlierFilter Arguments - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product

Aster Analytics

Release Number

7.00.02

Published

September 2017

Language

English (United States)

Last Update

2018-04-17

dita:mapPath

uce1497542673292.ditamap

dita:ditavalPath

AA-notempfilter_pdf_output.ditaval

dita:id

B700-1022

lifecycle

Product Category

Software

InputTable

Specifies the name of the table that contains the numeric data to be filtered and (optionally) the columns by which to group the data.

OutputTable

Specifies the name of the table where the function stores the copy of the input table (including the PARTITION BY column) with the outliers either deleted (by default) or replaced (as specified by the ReplacementValue argument).

TargetColumn

Specifies the names of the input table columns to be filtered.

OutlierTable

[Optional] Specifies the name of the table where the function outputs copies of the rows of the input table that contain outliers. Default behavior: Function does not output an outlier table.

GroupByColumns

[Optional] Specifies the names of the input table columns by which to group the data. If the data schema format is name:value, this list must include name. Default behavior: Function does not group data.

Method

[Optional] Specifies the method or methods of filtering outliers:

'percentile' (Default)
'tukey'
Tukey's test: An outlier is defined as any observation smaller than V1 - k*(V3-V1) or larger than V3 + k*(V3-V1), where V1 and V3 are the 25th and 75th percentiles of the data and k is specified by the argument IQRMultiplier.
'carling'
Carling's modification to Tukey's test: An outlier is defined as an observation outside the range V2 ± c*(V3 - V1), where V2 is the median of the data, V1 and V3 are the 25th and 75th percentiles of the data, and c is a constant that you cannot change.

For more information about Carling's modification, see:

Carling, Kenneth. "Resistant outlier rules and the non-Gaussian case." Computational Statistics & Data Analysis 33, no. 3 (2000): 249-258. Available at https://core.ac.uk/download/pdf/6559387.pdf.
'MAD-median'
Median absolute deviation (MAD), the median of the absolute values of the residuals. For example, if there are i datapoints and the median value of the data is M, MAD=mediani(|xi-M|).

Specify either one method, which the function uses for all columns specified by TargetColumn, or specify a method for each column specified by TargetColumn.

ApproxPercentile

[Optional] Specifies whether the function calculates the percentiles used as filter limits exactly. Default: 'false'.

Approximate percentiles are typically faster, but can fail when the number of groups exceeds one million.

PercentileThreshold

[Optional] Specifies the range of percentile values for 'percentile' filtering, [perc_lower, 100 -perc_lower]. Default: [5, 95].

PercentileAccuracy

[Optional] Specifies the accuracy of percentiles used for filtering. Default: 0.5%.

IQRMultiplier

[Optional] Specifies the multiplier of interquartile range for 'tukey' filtering. Default: 1.5.

RemoveTail

[Optional] Specifies the side of the distribution to filter. Default: 'both'.

ReplacementValue

[Optional] Specifies how the function handles outliers:

'delete' (Default)
The function does not copy the row to the output table.
'null'
The function copies the row to the output table, replacing each outlier with the value NULL.
'median'
The function copies the row to the output table, replacing each outlier with the median value for its group.
newval
The function copies the row to the output table, replacing each outlier with newval, which must be a numeric value.

MadScaleConstant

[Optional] Specifies the scale constant used with 'MAD-median' filtering; a DOUBLE PRECISION value. Default: 1.4826, which means MAD = 1.4826 * median(|x - median(x)|).

MadThreshold

[Optional] Specifies the threshold used with 'MAD-median' filtering; a DOUBLE PRECISION value. Default: 3, which means that |x-median(x)|/MAD > 3 is flagged as an outlier.