Median Absolute Deviation (MAD) returns the median of the set of values defined as the absolute value of the difference between each value and the median of all values in each group. It is a single-threaded function.
- This function is valid only on columns of numeric types.
- Nulls are not included in the result computation.
Example Prerequisites
See Example Setup to set up the environment for the following examples.
Example: Run mad() for all columns, using default constant multiplier
Calculate Median Absolute Deviation for all columns over one calendar day of timebucket duration.
Use default constant multiplier. No need to pass any arguments.
>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="1cd",value_expression="buoyid", fill="NULLS")
>>> ocean_buoys_grpby1.mad() TIMECODE_RANGE GROUP BY TIME(CAL_DAYS(1)) buoyid mad_salinity mad_temperature 0 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 737 44 0.0 0.0000 1 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 737 0 0.0 65.9757 2 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 737 2 0.0 1.4826 3 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 737 1 0.0 5.1891
Example: Run mad() on sequenced PTI table, using 2 as constant multiplier
>>> ocean_buoys_seq_grpby1 = ocean_buoys_seq.groupby_time(timebucket_duration="CAL_DAYS(2)", value_expression="buoyid", fill="NULLS")
>>> constant_multiplier_columns = {2: "*"}
>>> ocean_buoys_seq_grpby1.mad(constant_multiplier_columns).sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(CAL_DAYS(2)) buoyid mad2TD_SEQNO mad2salinity mad2temperature 0 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 0 4.0 0.0 89.0 1 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 1 12.0 0.0 7.0 2 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 2 2.0 0.0 2.0 3 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 22 0.0 0.0 0.0 4 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 44 6.0 0.0 0.0
Example: Run mad() on non-sequenced PTI table, using default constant multiplier
Calculate MAD values for all the columns in teradataml DataFrame created on NON-PTI table.
Use default constant multiplier while calculating MAD value for all columns except column 'temperature', where 2.485 is used as constant multiplier.
>>> ocean_buoys_nonpti_grpby1 = ocean_buoys_nonpti.groupby_time(timebucket_duration="2cdays", value_expression="buoyid", timecode_column="timecode", fill="NULLS")
>>> constant_multiplier_columns = {2.485: "temperature"}
>>> ocean_buoys_nonpti_grpby1.mad(constant_multiplier_columns).sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(CAL_DAYS(2)) buoyid mad2.485temperature mad_salinity 0 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 8039 0 110.5825 0.0 1 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 8039 1 8.6975 0.0 2 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 8039 2 2.4850 0.0 3 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 8039 44 0.0000 0.0
Example: Run the mad() on non-sequenced PTI table, using 3 as constant multiplier
Calculate MAD values for all the column in teradataml DataFrame created on NON-PTI table.
Use 3 as constant multiplier while calculating MAD value for all columns (buoyid and salinity), except column 'temperature', where 2 is used as constant multiplier.
>>> ocean_buoys_grpby3 = ocean_buoys.groupby_time(timebucket_duration="2cday", fill="NULLS")
>>> constant_multiplier_columns = {2: "temperature", 3:"*"}
>>> ocean_buoys_grpby3.mad(constant_multiplier_columns).sort(["TIMECODE_RANGE"]) TIMECODE_RANGE GROUP BY TIME(CAL_DAYS(2)) mad3buoyid mad3salinity mad2temperature 0 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 6.0 0.0 27.0