OutlierFilter Example 1: OutlierMethod ('percentile'), ReplacementValue ('null') - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

Input

The InputTable has a time series of atmospheric pressure readings (in mbar) for five cities.

InputTable: ville_pressuredata
sn city period pressure_mbar
1 Asheville 2010-01-01 00:00:00 1020.5
2 Asheville 2010-01-01 01:00:00 9000
3 Asheville 2010-01-01 02:00:00 1020
4 Asheville 2010-01-01 03:00:00 10000
5 Asheville 2010-01-01 04:00:00 1020.2
6 Asheville 2010-01-01 05:00:00 1020
7 Asheville 2010-01-01 06:00:00 1020.3
8 Asheville 2010-01-01 07:00:00 1020.8
9 Asheville 2010-01-01 08:00:00 1020.3
10 Asheville 2010-01-01 09:00:00 1020.7
... ... ... ...
25 Greenville 2010-01-01 00:00:00 1020.6
26 Greenville 2010-01-01 01:00:00 9000
27 Greenville 2010-01-01 02:00:00 1020.1
28 Greenville 2010-01-01 03:00:00 10000
29 Greenville 2010-01-01 04:00:00 1020.2
30 Greenville 2010-01-01 05:00:00 1020
... ... ... ...
49 Brownsville 2010-01-01 00:00:00 1020.5
50 Brownsville 2010-01-01 01:00:00 9000
51 Brownsville 2010-01-01 02:00:00 1020
52 Brownsville 2010-01-01 03:00:00 10000
53 Brownsville 2010-01-01 04:00:00 1020.2
54 Brownsville 2010-01-01 05:00:00 1020
... ... ... ...
73 Nashville 2010-01-01 00:00:00 1020.4
74 Nashville 2010-01-01 01:00:00 9000
75 Nashville 2010-01-01 02:00:00 1019.9
76 Nashville 2010-01-01 03:00:00 10000
77 Nashville 2010-01-01 04:00:00 1020.1
78 Nashville 2010-01-01 05:00:00 1019.9
... ... ... ...
97 Knoxville 2010-01-01 00:00:00 1020.4
98 Knoxville 2010-01-01 01:00:00 9000
99 Knoxville 2010-01-01 02:00:00 1019.9
100 Knoxville 2010-01-01 03:00:00 10000
101 Knoxville 2010-01-01 04:00:00 1020
102 Knoxville 2010-01-01 05:00:00 1019.9
... ... ... ...

SQL Call

SELECT * FROM OutlierFilter (
  ON ville_pressuredata AS InputTable
  OUT TABLE OutputTable (of_output1)
  USING
  TargetColumns ('pressure_mbar')
  OutlierMethod ('percentile')
  PercentileThreshold (1,90)
  RemoveTail ('both')
  ReplacementValue ('null')
  GroupByColumns ('city')
) AS dt;

Output

message
Output tables created successfully

This query returns the following table:

SELECT * FROM of_output1 ORDER BY 1;

The outlying values have been replaced by NULL.

of_output1
sn city period pressure_mbar
1 Asheville 2010-01-01 00:00:00 1020.5
2 Asheville 2010-01-01 01:00:00 NULL
3 Asheville 2010-01-01 02:00:00 1020
4 Asheville 2010-01-01 03:00:00 NULL
5 Asheville 2010-01-01 04:00:00 1020.2
6 Asheville 2010-01-01 05:00:00 1020
7 Asheville 2010-01-01 06:00:00 1020.3
8 Asheville 2010-01-01 07:00:00 1020.8
9 Asheville 2010-01-01 08:00:00 1020.3
10 Asheville 2010-01-01 09:00:00 1020.7
11 Asheville 2010-01-01 10:00:00 NULL
12 Asheville 2010-01-01 11:00:00 1022
13 Asheville 2010-01-01 12:00:00 1021.1
14 Asheville 2010-01-01 13:00:00 1020
15 Asheville 2010-01-01 14:00:00 1019.3
... ... ... ...