Teradata Package for Python Function Reference - mean - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.00

Published

November 2021

Language

English (United States)

Last Update

2021-11-19

lifecycle

Product Category

Teradata Vantage

teradataml.dataframe.dataframe.DataFrameGroupByTime.mean = mean(self, distinct=False): DESCRIPTION: Returns column-wise mean value of the dataframe. Notes: 1. This function is valid only on columns with numeric types. 2. Null values are not included in the result computation. PARAMETERS: distinct: Optional Argument. Specifies whether to exclude duplicate values while calculating the mean. Default Values: False RETURNS: teradataml DataFrame object with mean() operation performed. RAISES: TeradataMLException 1. TDMLDF_AGGREGATE_FAILED - If mean() operation fails to generate the column-wise mean value of the dataframe. Possible error message: Unable to perform 'mean()' on the dataframe. 2. TDMLDF_AGGREGATE_COMBINED_ERR - If the mean() operation doesn't support all the columns in the dataframe. Possible error message: No results. Below is/are the error message(s): All selected columns [(col2 - PERIOD_TIME), (col3 - BLOB)] is/are unsupported for 'mean' operation. EXAMPLES : # Load the data to run the example. >>> from teradataml.data.load_example_data import load_example_data >>> load_example_data("dataframe", ["employee_info"]) # Create teradataml dataframe. >>> df1 = DataFrame("employee_info") >>> print(df1) first_name marks dob joined_date employee_no 101 abcde None None 02/12/05 100 abcd None None None 112 None None None 18/12/05 >>> # Select only subset of columns from the DataFrame. >>> df2 = df1.select(['employee_no', 'marks', 'first_name']) # Prints mean value of each column(with supported data types). >>> df2.mean() mean_employee_no mean_marks 0 104.333333 None >>> # # Using mean() as Time Series Aggregate. # >>> # Load the example datasets. ... load_example_data("dataframe", ["ocean_buoys"]) >>> >>> # Create the required DataFrames. ... # DataFrame on non-sequenced PTI table ... ocean_buoys = DataFrame("ocean_buoys") >>> # Check DataFrame columns and let's peek at the data ... ocean_buoys.columns ['buoyid', 'TD_TIMECODE', 'temperature', 'salinity'] >>> ocean_buoys.head() TD_TIMECODE temperature salinity buoyid 0 2014-01-06 08:10:00.000000 100.0 55 0 2014-01-06 08:08:59.999999 NaN 55 1 2014-01-06 09:01:25.122200 77.0 55 1 2014-01-06 09:03:25.122200 79.0 55 1 2014-01-06 09:01:25.122200 70.0 55 1 2014-01-06 09:02:25.122200 71.0 55 1 2014-01-06 09:03:25.122200 72.0 55 0 2014-01-06 08:09:59.999999 99.0 55 0 2014-01-06 08:00:00.000000 10.0 55 0 2014-01-06 08:10:00.000000 10.0 55 # # Time Series Aggregate Example 1: Executing mean() function on DataFrame created on # non-sequenced PTI table. We will consider all rows for the # columns while calculating the mean values. # # To use mean() as Time Series Aggregate we must run groupby_time() first, followed by mean(). >>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="2cy", ... value_expression="buoyid", fill="NULLS") >>> ocean_buoys_grpby1.mean().sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid mean_salinity mean_temperature 0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 55.0 54.750000 1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 55.0 74.500000 2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 55.0 81.000000 3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 55.0 48.076923 >>> # # Time Series Aggregate Example 2: Executing mean() function on DataFrame created on # non-sequenced PTI table. We will consider DISTINCT values for the # columns while calculating the mean value. # # To use mean() as Time Series Aggregate we must run groupby_time() first, followed by mean(). >>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="2cy", ... value_expression="buoyid", fill="NULLS") >>> ocean_buoys_grpby1.mean(distinct = True).sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid mean_salinity mean_temperature 0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 55.0 69.666667 1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 55.0 74.500000 2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 55.0 81.000000 3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 55.0 52.200000 >>>