median() in Time Series Aggregate Mode - Teradata Python Package

Teradata® Python Package User Guide

Product
Teradata Python Package
Release Number
16.20
Published
February 2020
Language
English (United States)
Last Update
2020-02-29
dita:mapPath
rkb1531260709148.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage
The median() function returns column-wise median value of the dataframe.
  • This function is valid only on columns of numeric types.
  • Nulls are not included in the result computation.
Examples here are only for median() as Time Series Aggregate function. For median() as regular aggregate, refer to median() in Regular Aggregate Mode.

Examples Prerequisite

See Example Setup to set up the environment for the following examples.

To use median() as Time Series Aggregate, you must run groupby_time() first, followed by median().
>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="2cy", value_expression="buoyid", fill="NULLS")

Example: Run median() on DataFrame created on non-sequenced PTI table, for all rows

In this example, consider all rows for the columns while calculating the median value.

>>> ocean_buoys_grpby1.median().sort(["TIMECODE_RANGE", "buoyid"])
                                      TIMECODE_RANGE  GROUP BY TIME(CAL_YEARS(2))  buoyid  median_temperature  median_salinity
0  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       0                54.5             55.0
1  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       1                74.5             55.0
2  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       2                81.0             55.0
3  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2      44                43.0             55.0

Example: Run median() on DataFrame created on non-sequenced PTI table, for DISTINCT rows

In this example, consider DISTINCT rows for the columns while calculating the median value.

>>> ocean_buoys_grpby1.median(distinct = True).sort(["TIMECODE_RANGE", "buoyid"])
                                      TIMECODE_RANGE  GROUP BY TIME(CAL_YEARS(2))  buoyid  median_temperature  median_salinity
0  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       0                99.0             55.0
1  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       1                74.5             55.0
2  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       2                81.0             55.0
3  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2      44                54.0             55.0