Teradata Package for Python Function Reference | 20.00 - mad - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- Product Category
- Teradata Vantage
- teradataml.dataframe.dataframe.DataFrameGroupByTime.mad = mad(self, constant_multiplier_columns=None)
- DESCRIPTION:
Median Absolute Deviation (MAD) returns the median of the set of values defined as
the absolute value of the difference between each value and the median of all values
in each group.
This is a single-threaded function.
Formula for computing MAD is as follows:
MAD = b * Mi(|Xi - Mj(Xj)|)
Where,
b = Some numeric constant. Default value is 1.4826.
Mj(Xj) = Median of the original set of values.
Xi = The original set of values.
Mi = Median of absolute value of the difference between
each value in Xi and the Median calculated in Mj(Xj).
Note:
1. This function is valid only on columns with numeric types.
2. Null values are not included in the result computation.
PARAMETERS:
constant_multiplier_columns:
Optional Argument.
Specifies a dictionary that accepts numeric values to be used as constant multiplier
(b in the above formula) as key in the dictionary. Key should be any numeric value
greater than or equal to 0. Whereas value in the dictionary can be a column name or
list of column names. Sometimes, value can also include a special character '*',
instead of column name. This should be used only when one wants to use same constant
multiplier for all columns.
Note:
For all numeric columns in teradataml DataFrame, that are not specified in this argument,
default value of constant_multiplier is used, which is 1.4826.
Types: Dictionary
Examples:
# Let's assume, a teradataml DataFrame has following columns:
# col1, col2, col3, ..., colN
# Use 2 as constant multiplier for column "col1" and default for rest.
constant_multiplier_columns = {2: "col1"}
# Use 2.485 as constant multiplier for column "col1", 5 for "col3" and default for rest.
constant_multiplier_columns = {2.485: "col1", 5: "col3"}
# Use 2.485 as constant multiplier for column "col1", "col2" and "col3" and default for rest.
constant_multiplier_columns = {2.485: ["col1", "col2", "col3"]}
#
# Use cases for using '*' default value.
#
# Use 2.485 as constant multiplier for all columns. In this case, we do not need
# to specify all the columns, we can just use '*'.
constant_multiplier_columns = {2.485: "*"}
# Use 2.485 as constant multiplier for column "col1" and "col3"
# and 1.5 for rest of the columns:
constant_multiplier_columns = {2.485: ["col1", "col3"], 1.5: "*"}
# We can use default value column character ('*') in list as well
# Use 2.485 as constant multiplier for column "col1" and "col3"
# and 1.5 for "col4" and rest of the columns:
constant_multiplier_columns = {2.485: ["col1", "col3"], 1.5: ["col4", "*"]}
RETURNS:
teradataml DataFrame
RAISES:
TypeError - If incorrect type of values passed to input argument.
ValueError - If invalid value passed to the the argument.
TeradataMLException
1. TDMLDF_AGGREGATE_FAILED - If mad() operation fails to
generate the column-wise median absolute deviation in the columns.
EXAMPLES :
>>> # Load the example datasets.
... load_example_data("dataframe", ["ocean_buoys", "ocean_buoys_seq", "ocean_buoys_nonpti"])
>>>
>>> # Create the required DataFrames.
... # DataFrame on non-sequenced PTI table
... ocean_buoys = DataFrame("ocean_buoys")
>>> # Check DataFrame columns and let's peek at the data
... ocean_buoys.columns
['buoyid', 'TD_TIMECODE', 'temperature', 'salinity']
>>> ocean_buoys.head()
TD_TIMECODE temperature salinity
buoyid
0 2014-01-06 08:10:00.000000 100.0 55
0 2014-01-06 08:08:59.999999 NaN 55
1 2014-01-06 09:01:25.122200 77.0 55
1 2014-01-06 09:03:25.122200 79.0 55
1 2014-01-06 09:01:25.122200 70.0 55
1 2014-01-06 09:02:25.122200 71.0 55
1 2014-01-06 09:03:25.122200 72.0 55
0 2014-01-06 08:09:59.999999 99.0 55
0 2014-01-06 08:00:00.000000 10.0 55
0 2014-01-06 08:10:00.000000 10.0 55
>>> # DataFrame on sequenced PTI table
... ocean_buoys_seq = DataFrame("ocean_buoys_seq")
>>> # Check DataFrame columns and let's peek at the data
... ocean_buoys_seq.columns
['TD_TIMECODE', 'TD_SEQNO', 'buoyid', 'salinity', 'temperature', 'dates']
>>> ocean_buoys_seq.head()
TD_TIMECODE TD_SEQNO salinity temperature dates
buoyid
0 2014-01-06 08:00:00.000000 26 55 10.0 2016-02-26
0 2014-01-06 08:08:59.999999 18 55 NaN 2015-06-18
1 2014-01-06 09:02:25.122200 24 55 78.0 2015-12-24
1 2014-01-06 09:01:25.122200 23 55 77.0 2015-11-23
1 2014-01-06 09:02:25.122200 12 55 71.0 2014-12-12
1 2014-01-06 09:03:25.122200 13 55 72.0 2015-01-13
1 2014-01-06 09:01:25.122200 11 55 70.0 2014-11-11
0 2014-01-06 08:10:00.000000 19 55 10.0 2015-07-19
0 2014-01-06 08:09:59.999999 17 55 99.0 2015-05-17
0 2014-01-06 08:10:00.000000 27 55 100.0 2016-03-27
>>> # DataFrame on NON-PTI table
... ocean_buoys_nonpti = DataFrame("ocean_buoys_nonpti")
>>> # Check DataFrame columns and let's peek at the data
... ocean_buoys_nonpti.columns
['buoyid', 'timecode', 'temperature', 'salinity']
>>> ocean_buoys_nonpti.head()
buoyid temperature salinity
timecode
2014-01-06 08:09:59.999999 0 99.0 55
2014-01-06 08:10:00.000000 0 10.0 55
2014-01-06 09:01:25.122200 1 70.0 55
2014-01-06 09:01:25.122200 1 77.0 55
2014-01-06 09:02:25.122200 1 71.0 55
2014-01-06 09:03:25.122200 1 72.0 55
2014-01-06 09:02:25.122200 1 78.0 55
2014-01-06 08:10:00.000000 0 100.0 55
2014-01-06 08:08:59.999999 0 NaN 55
2014-01-06 08:00:00.000000 0 10.0 55
#
# Example 1: Calculate Median Absolute Deviation for all columns over 1 calendar day of
# timebucket duration. Use default constant multiplier.
# No need to pass any arguments.
#
>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="1cd",value_expression="buoyid", fill="NULLS")
>>> ocean_buoys_grpby1.mad()
TIMECODE_RANGE GROUP BY TIME(CAL_DAYS(1)) buoyid mad_salinity mad_temperature
0 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 737 44 0.0 0.0000
1 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 737 0 0.0 65.9757
2 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 737 2 0.0 1.4826
3 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 737 1 0.0 5.1891
>>>
#
# Example 2: Calculate MAD values using 2 as constant multiplier for all the columns
# in ocean_buoys_seq DataFrame on sequenced PTI table.
#
>>> ocean_buoys_seq_grpby1 = ocean_buoys_seq.groupby_time(timebucket_duration="CAL_DAYS(2)", value_expression="buoyid", fill="NULLS")
>>> constant_multiplier_columns = {2: "*"}
>>> ocean_buoys_seq_grpby1.mad(constant_multiplier_columns).sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(CAL_DAYS(2)) buoyid mad2TD_SEQNO mad2salinity mad2temperature
0 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 0 4.0 0.0 89.0
1 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 1 12.0 0.0 7.0
2 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 2 2.0 0.0 2.0
3 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 22 0.0 0.0 0.0
4 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 44 6.0 0.0 0.0
>>>
#
# Example 3: Calculate MAD values for all the column in teradataml DataFrame created on NON-PTI table.
# Use default constant multiplier while calculating MAD value for all columns except
# column 'temperature', where 2.485 is used as constant multiplier.
#
>>> ocean_buoys_nonpti_grpby1 = ocean_buoys_nonpti.groupby_time(timebucket_duration="2cdays",
... value_expression="buoyid",
... timecode_column="timecode", fill="NULLS")
>>> constant_multiplier_columns = {2.485: "temperature"}
>>> ocean_buoys_nonpti_grpby1.mad(constant_multiplier_columns).sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(CAL_DAYS(2)) buoyid mad2.485temperature mad_salinity
0 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 8039 0 110.5825 0.0
1 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 8039 1 8.6975 0.0
2 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 8039 2 2.4850 0.0
3 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 8039 44 0.0000 0.0
>>>
#
# Example 4: Calculate MAD values for all the column in teradataml DataFrame created on NON-PTI table.
# Use 3 as constant multiplier while calculating MAD value for all columns (buoyid and
# salinity), except column 'temperature', where 2 is used as constant multiplier.
#
>>> ocean_buoys_grpby3 = ocean_buoys.groupby_time(timebucket_duration="2cday", fill="NULLS")
>>> constant_multiplier_columns = {2: "temperature", 3:"*"}
>>> ocean_buoys_grpby3.mad(constant_multiplier_columns).sort(["TIMECODE_RANGE"])
TIMECODE_RANGE GROUP BY TIME(CAL_DAYS(2)) mad3buoyid mad3salinity mad2temperature
0 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 6.0 0.0 27.0
>>>