describe() in Time Series Aggregate Mode

describe() in Time Series Aggregate Mode - Teradata Python Package

Teradata® Python Package User Guide

Product

Teradata Python Package

Release Number

16.20

Published

February 2020

Language

English (United States)

Last Update

2020-02-29

dita:mapPath

rkb1531260709148.ditamap

dita:ditavalPath

Generic_no_ie_no_tempfilter.ditaval

dita:id

B700-4006

lifecycle

Product Category

Teradata Vantage

The describe() function generates statistics for numeric columns. This function can be used in two modes:

Regular Aggregate Mode
It computes the count, mean, std, min, percentiles, and max for numeric columns.

Default statistics include: "count", "mean", "std", "min", "percentile", "max".

If describe() is used on the output of any DataFrame API or groupby(), then it is used in regular aggregate mode.
Time Series Aggregate Mode
It computes max, mean, min, std, median, mode, and percentiles for numeric columns.

Default statistics include: 'max', 'mean', 'min', 'std'

If describe() is used on the output of groupby_time(), then it is used in time series aggregate mode, where time series aggregates are used to calculate the statistics.

Examples here are only for describe() as Time Series Aggregate function. For describe() as regular aggregate, refer to describe() in Regular Aggregate Mode.

Examples Prerequisite

See Example Setup to set up the environment for the following examples.

Example: Get the basic statistics

Get the basic statistics for time series aggregation for all the numeric columns, use default settings. This example returns max, mean, min and std values.

>>> ocean_buoys_grpby.describe()
                                                                                           temperature salinity
TIMECODE_RANGE                                     GROUP BY TIME(CAL_YEARS(2)) buoyid func
('2014-01-01 00:00:00.000000-00:00', '2016-01-0... 2                           0      max          100       55
                                                                                      mean       54.75       55
                                                                                      min           10       55
                                                                                      std       51.674        0
                                                                               1      max           79       55
                                                                                      mean        74.5       55
                                                                                      min           70       55
                                                                                      std        3.937        0
                                                                               2      max           82       55
                                                                                      mean          81       55
                                                                                      min           80       55
                                                                                      std            1        0
                                                                               44     max           56       55
                                                                                      mean      48.077       55
                                                                                      min           43       55
                                                                                      std        5.766        0

Example: Get the verbose statistics

Get the verbose statistics for time series aggregation for all the numeric columns, use default settings. This example returns max, mean, min, std, median, mode, 25th, 50th and 75th percentile.

>>> ocean_buoys_grpby.describe(verbose=True)
                                                                                             temperature salinity
TIMECODE_RANGE                                     GROUP BY TIME(CAL_YEARS(2)) buoyid func
('2014-01-01 00:00:00.000000-00:00', '2016-01-0... 2                           0      25%             10       55
                                                                                      50%           54.5       55
                                                                                      75%          99.25       55
                                                                                      max            100       55
                                                                                      mean         54.75       55
                                                                                      median        54.5       55
                                                                                      min             10       55
                                                                                      mode            10       55
                                                                                      std         51.674        0
                                                                               1      25%          71.25       55
                                                                                      50%           74.5       55
                                                                                      75%          77.75       55
                                                                                      max             79       55
                                                                                      mean          74.5       55
                                                                                      median        74.5       55
                                                                                      min             70       55
                                                                                      mode            71       55
                                                                                      mode            72       55
                                                                                      mode            77       55
                                                                                      mode            78       55
                                                                                      mode            79       55
                                                                                      mode            70       55
                                                                                      std          3.937        0
                                                                               2      25%           80.5       55
                                                                                      50%             81       55
                                                                                      75%           81.5       55
                                                                                      max             82       55
                                                                                      mean            81       55
                                                                                      median          81       55
                                                                                      min             80       55
                                                                                      mode            80       55
                                                                                      mode            81       55
                                                                                      mode            82       55
                                                                                      std              1        0
                                                                               44     25%             43       55
                                                                                      50%             43       55
                                                                                      75%             53       55
                                                                                      max             56       55
                                                                                      mean        48.077       55
                                                                                      median          43       55
                                                                                      min             43       55
                                                                                      mode            43       55
                                                                                      std          5.766        0

Example: Get the basic statistics, consider only unique values

Get the basic statistics for time series aggregation for all the numeric columns, consider only unique values. This example returns max, mean, min and std values.

>>> ocean_buoys_grpby.describe(distinct=True)
                                                                                           temperature salinity
TIMECODE_RANGE                                     GROUP BY TIME(CAL_YEARS(2)) buoyid func
('2014-01-01 00:00:00.000000-00:00', '2016-01-0... 2                           0      max          100       55
                                                                                      mean      69.667       55
                                                                                      min           10       55
                                                                                      std       51.675     None
                                                                               1      max           79       55
                                                                                      mean        74.5       55
                                                                                      min           70       55
                                                                                      std        3.937     None
                                                                               2      max           82       55
                                                                                      mean          81       55
                                                                                      min           80       55
                                                                                      std            1     None
                                                                               44     max           56       55
                                                                                      mean        52.2       55
                                                                                      min           43       55
                                                                                      std        5.263     None

Example: Get the verbose statistics, select non-default percentiles

Get the verbose statistics for time series aggregation for all the numeric columns. In this example, you select non-default percentiles 33rd and 66th. This example returns max, mean, min, std, median, mode, 33rd, and 66th percentile.

>>> ocean_buoys_grpby.describe(verbose=True, percentiles=[0.33, 0.66])
                                                                                             temperature salinity
TIMECODE_RANGE                                     GROUP BY TIME(CAL_YEARS(2)) buoyid func
('2014-01-01 00:00:00.000000-00:00', '2016-01-0... 2                           0      33%             10       55
                                                                                      66%          97.22       55
                                                                                      max            100       55
                                                                                      mean         54.75       55
                                                                                      median        54.5       55
                                                                                      min             10       55
                                                                                      mode            10       55
                                                                                      std         51.674        0
                                                                               1      33%          71.65       55
                                                                                      66%           77.3       55
                                                                                      max             79       55
                                                                                      mean          74.5       55
                                                                                      median        74.5       55
                                                                                      min             70       55
                                                                                      mode            70       55
                                                                                      mode            71       55
                                                                                      mode            77       55
                                                                                      mode            78       55
                                                                                      mode            79       55
                                                                                      mode            72       55
                                                                                      std          3.937        0
                                                                               2      33%          80.66       55
                                                                                      66%          81.32       55
                                                                                      max             82       55
                                                                                      mean            81       55
                                                                                      median          81       55
                                                                                      min             80       55
                                                                                      mode            80       55
                                                                                      mode            81       55
                                                                                      mode            82       55
                                                                                      std              1        0
                                                                               44     33%             43       55
                                                                                      66%             53       55
                                                                                      max             56       55
                                                                                      mean        48.077       55
                                                                                      median          43       55
                                                                                      min             43       55
                                                                                      mode            43       55
                                                                                      std          5.766        0