Teradata Package for Python Function Reference on VantageCloud Lake - median - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference on VantageCloud Lake
- Deployment
- VantageCloud
- Edition
- Lake
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Lake_2000
- Product Category
- Teradata Vantage
- teradataml.dataframe.dataframe.DataFrame.median = median(self, distinct=False)
- DESCRIPTION:
Returns column-wise median value of the dataframe.
Notes:
1. This function is valid only on columns with numeric types.
2. Null values are not included in the result computation.
PARAMETERS:
distinct:
Optional Argument.
Specifies whether to exclude duplicate values while calculating the median.
Note:
This is allowed only when median() is used as Time Series Aggregate function, i.e.,
this can be set to True, only when median() is operated on DataFrameGroupByTime object.
Otherwise, an exception will be raised.
Default Values: False
RETURNS:
teradataml DataFrame object with median() operation
performed.
RAISES:
1. TDMLDF_AGGREGATE_FAILED - If median() operation fails to
generate the column-wise median value of the dataframe.
Possible error message:
Unable to perform 'median()' on the dataframe.
2. TDMLDF_AGGREGATE_COMBINED_ERR - If the median() operation
doesn't support all the columns in the dataframe.
Possible error message:
No results. Below is/are the error message(s):
All selected columns [(col2 - PERIOD_TIME), (col3 -
BLOB)] is/are unsupported for 'median' operation.
EXAMPLES :
# Load the data to run the example.
>>> from teradataml.data.load_example_data import load_example_data
>>> load_example_data("dataframe", ["employee_info"])
# Create teradataml dataframe.
>>> df1 = DataFrame("employee_info")
>>> print(df1)
first_name marks dob joined_date
employee_no
101 abcde None None 02/12/05
100 abcd None None None
112 None None None 18/12/05
>>>
# Prints median value of each column(with supported data types).
>>> df1.median()
median_employee_no median_marks
0 101 None
>>>
#
# Using median() as Time Series Aggregate.
#
>>> # Load the example datasets.
... load_example_data("dataframe", ["ocean_buoys"])
>>>
#
# Time Series Aggregate Example 1: Executing median() function on DataFrame created on
# non-sequenced PTI table. We will consider all rows for the
# columns while calculating the median value.
#
>>> # Create the required DataFrames.
... # DataFrame on non-sequenced PTI table
... ocean_buoys = DataFrame("ocean_buoys")
>>> # Check DataFrame columns and let's peek at the data
... ocean_buoys.columns
['buoyid', 'TD_TIMECODE', 'temperature', 'salinity']
>>> ocean_buoys.head()
TD_TIMECODE temperature salinity
buoyid
0 2014-01-06 08:10:00.000000 100.0 55
0 2014-01-06 08:08:59.999999 NaN 55
1 2014-01-06 09:01:25.122200 77.0 55
1 2014-01-06 09:03:25.122200 79.0 55
1 2014-01-06 09:01:25.122200 70.0 55
1 2014-01-06 09:02:25.122200 71.0 55
1 2014-01-06 09:03:25.122200 72.0 55
0 2014-01-06 08:09:59.999999 99.0 55
0 2014-01-06 08:00:00.000000 10.0 55
0 2014-01-06 08:10:00.000000 10.0 55
# To use median() as Time Series Aggregate we must run groupby_time() first, followed by median().
>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="2cy",
... value_expression="buoyid", fill="NULLS")
>>> ocean_buoys_grpby1.median().sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid median_temperature median_salinity
0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 54.5 55.0
1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 74.5 55.0
2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 81.0 55.0
3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 43.0 55.0
>>>
#
# Time Series Aggregate Example 2: Executing median() function on DataFrame created on
# non-sequenced PTI table. We will consider DISTINCT rows for the
# columns while calculating the median value.
#
# To use median() as Time Series Aggregate we must run groupby_time() first, followed by median().
>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="2cy",
... value_expression="buoyid", fill="NULLS")
>>> ocean_buoys_grpby1.median(distinct = True).sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid median_temperature median_salinity
0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 99.0 55.0
1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 74.5 55.0
2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 81.0 55.0
3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 54.0 55.0
>>>