mad() - Teradata Python Package

Teradata® Python Package User Guide

Product
Teradata Python Package
Release Number
16.20
Published
February 2020
Language
English (United States)
Last Update
2020-02-29
dita:mapPath
rkb1531260709148.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage

Median Absolute Deviation (MAD) returns the median of the set of values defined as the absolute value of the difference between each value and the median of all values in each group. It is a single-threaded function.

  • This function is valid only on columns of numeric types.
  • Nulls are not included in the result computation.

Example Prerequisites

See Example Setup to set up the environment for the following examples.

Example: Run mad() for all columns, using default constant multiplier

Calculate Median Absolute Deviation for all columns over one calendar day of timebucket duration.

Use default constant multiplier. No need to pass any arguments.

>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="1cd",value_expression="buoyid", fill="NULLS")
>>> ocean_buoys_grpby1.mad()
 TIMECODE_RANGE GROUP BY TIME(CAL_DAYS(1)) buoyid mad_salinity mad_temperature
0 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 737 44 0.0 0.0000
1 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 737 0 0.0 65.9757
2 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 737 2 0.0 1.4826
3 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 737 1 0.0 5.1891

Example: Run mad() on sequenced PTI table, using 2 as constant multiplier

>>> ocean_buoys_seq_grpby1 = ocean_buoys_seq.groupby_time(timebucket_duration="CAL_DAYS(2)", value_expression="buoyid", fill="NULLS")
>>> constant_multiplier_columns = {2: "*"}
>>> ocean_buoys_seq_grpby1.mad(constant_multiplier_columns).sort(["TIMECODE_RANGE", "buoyid"])
                                      TIMECODE_RANGE  GROUP BY TIME(CAL_DAYS(2))  buoyid  mad2TD_SEQNO  mad2salinity  mad2temperature
0  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                         369       0           4.0           0.0             89.0
1  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                         369       1          12.0           0.0              7.0
2  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                         369       2           2.0           0.0              2.0
3  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                         369      22           0.0           0.0              0.0
4  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                         369      44           6.0           0.0              0.0

Example: Run mad() on non-sequenced PTI table, using default constant multiplier

Calculate MAD values for all the columns in teradataml DataFrame created on NON-PTI table.

Use default constant multiplier while calculating MAD value for all columns except column 'temperature', where 2.485 is used as constant multiplier.

>>> ocean_buoys_nonpti_grpby1 = ocean_buoys_nonpti.groupby_time(timebucket_duration="2cdays", value_expression="buoyid", timecode_column="timecode", fill="NULLS")
>>> constant_multiplier_columns = {2.485: "temperature"}
>>> ocean_buoys_nonpti_grpby1.mad(constant_multiplier_columns).sort(["TIMECODE_RANGE", "buoyid"])
                                      TIMECODE_RANGE  GROUP BY TIME(CAL_DAYS(2))  buoyid  mad2.485temperature  mad_salinity
0  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                        8039       0             110.5825           0.0
1  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                        8039       1               8.6975           0.0
2  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                        8039       2               2.4850           0.0
3  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                        8039      44               0.0000           0.0

Example: Run the mad() on non-sequenced PTI table, using 3 as constant multiplier

Calculate MAD values for all the column in teradataml DataFrame created on NON-PTI table.

Use 3 as constant multiplier while calculating MAD value for all columns (buoyid and salinity), except column 'temperature', where 2 is used as constant multiplier.

>>> ocean_buoys_grpby3 = ocean_buoys.groupby_time(timebucket_duration="2cday", fill="NULLS")
>>> constant_multiplier_columns = {2: "temperature", 3:"*"}
>>> ocean_buoys_grpby3.mad(constant_multiplier_columns).sort(["TIMECODE_RANGE"])
                                      TIMECODE_RANGE  GROUP BY TIME(CAL_DAYS(2))  mad3buoyid  mad3salinity  mad2temperature
0  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                         369         6.0           0.0             27.0