Argument | Category | Description |
---|---|---|
InputTable | Required | Specifies the name of the table that contains the numeric data to be filtered and (optionally) the columns by which to group the data. |
OutputTable | Required | Specifies the name of the table where the function stores the copy of the input table (including the PARTITION BY column) with the outliers either deleted (by default) or replaced (as specified by the ReplacementValue argument). |
TargetColumn | Required | Specifies the names of the input table columns to be filtered. |
OutlierTable | Optional | Specifies the name of the table where the function outputs copies of the rows of the input table that contain outliers. |
GroupByColumns | Optional | Specifies the names of the input table columns by which to group the data. If the data schema format is name:value, then this list must include name. |
Method | Optional | Specifies the method or methods of filtering outliers:
Specify either one method, which the function uses for all columns specified by TargetColumn, or specify a method for each column specified by TargetColumn. |
ApproxPercentile | Optional | Specifies whether the function calculates the percentiles used as filter limits exactly. The default value is 'false'. Approximate percentiles are typically faster, but might fail when the number of groups exceeds one million. |
PercentileThreshold | Optional | Specifies the range of percentile values for 'percentile' filtering, [perc_lower, 100 -perc_lower]. The default filter range is [5, 95]. |
PercentileAccuracy | Optional | Specifies the accuracy of percentiles used for filtering. The default value is 0.5%. |
IQRMultiplier | Optional | Specifies the multiplier of interquartile range for 'tukey' filtering. The default value is 1.5. |
RemoveTail | Optional | Specifies the side of the distribution to filter. The default value is 'both'. |
ReplacementValue | Optional | Specifies how the function handles outliers:
|
MadScaleConstant | Optional | Specifies the scale constant used with 'MAD-median' filtering; a DOUBLE PRECISION value. The default value is 1.4826, which means MAD = 1.4826 * median(|x - median(x)|). |
MadThreshold | Optional | Specifies the threshold used with 'MAD-median' filtering; a DOUBLE PRECISION value. The default value is 3, which means that |x-median(x)|/MAD > 3 is flagged as an outlier. |