describe() in Regular Aggregate Mode - Teradata Python Package

Teradata® Python Package User Guide

Product

Teradata Python Package

Release Number

16.20

Published

February 2020

Language

English (United States)

Last Update

2020-02-29

dita:mapPath

rkb1531260709148.ditamap

dita:ditavalPath

Generic_no_ie_no_tempfilter.ditaval

dita:id

B700-4006

lifecycle

Product Category

Teradata Vantage

The describe() function generates statistics for numeric columns. This function can be used in two modes:

Regular Aggregate Mode
It computes the count, mean, std, min, percentiles, and max for numeric columns.

Default statistics include: "count", "mean", "std", "min", "percentile", "max".

If describe() is used on the output of any DataFrame API or groupby(), then it is used in regular aggregate mode.
Time Series Aggregate Mode
It computes max, mean, min, std, median, mode, and percentiles for numeric columns.

Default statistics include: 'max', 'mean', 'min', 'std'

If describe() is used on the output of groupby_time(), then it is used in time series aggregate mode, where time series aggregates are used to calculate the statistics.

Examples here are for describe() as regular function or aggregate function. For describe() as Time Series Aggregate, refer to describe() in Time Series Aggregate Mode.

Example Prerequisite

>>> df = DataFrame('sales')

>>> df
              Feb   Jan   Mar   Apr    datetime
accounts                                      
Alpha Co    210.0   200   215   250  04/01/2017
Red Inc     200.0   150   140  None  04/01/2017
Orange Inc  210.0  None  None   250  04/01/2017
Jones LLC   200.0   150   140   180  04/01/2017
Yellow Inc   90.0  None  None  None  04/01/2017
Blue Inc     90.0    50    95   101  04/01/2017

Example: Generates statistics for DataFrame "sales"

Use default values to computes count, mean, std, min, percentiles, and max for numeric columns.

>>> df.describe()
          Apr      Feb     Mar     Jan
func
count       4        6       4       4
mean   195.25  166.667   147.5   137.5
std    70.971   59.554  49.749  62.915
min       101       90      95      50
25%    160.25    117.5  128.75     125
50%       215      200     140     150
75%       250    207.5  158.75   162.5
max       250      210     215     200

Example: Use argument percentiles to compute the 30th and 60th percentiles

>>> df.describe(percentiles=[.3, .6])
          Apr      Feb     Mar     Jan
func
count       4        6       4       4
mean   195.25  166.667   147.5   137.5
std    70.971   59.554  49.749  62.915
min       101       90      95      50
30%     172.1      145   135.5     140
60%       236      200     140     150
max       250      210     215     200

Example: Use groupby to compute statistics for specific groups

>>> df1 = df.groupby(["datetime", "Feb"])

>>> df1.describe()
                         Jan   Mar   Apr
datetime   Feb   func                  
04/01/2017 90.0  25%      50    95   101
                 50%      50    95   101
                 75%      50    95   101
                 count     1     1     1
                 max      50    95   101
                 mean     50    95   101
                 min      50    95   101
                 std    None  None  None
           200.0 25%     150   140   180
                 50%     150   140   180
                 75%     150   140   180
                 count     2     2     1
                 max     150   140   180
                 mean    150   140   180
                 min     150   140   180
                 std       0     0  None
           210.0 25%     200   215   250
                 50%     200   215   250
                 75%     200   215   250
                 count     1     1     2
                 max     200   215   250
                 mean    200   215   250
                 min     200   215   250
                 std    None  None     0

Example: Use argument include value 'all' to compute statistics for all columns

Computes count, mean, std, min, percentiles, and max for numeric columns and computes count and unique for non-numeric columns.

>>> df.describe(include="all")
       accounts      Feb     Jan     Mar     Apr datetime
func                                                    
25%        None    117.5     125  128.75  160.25     None
75%        None    207.5   162.5  158.75     250     None
count         6        6       4       4       4        6
mean       None  166.667   137.5   147.5  195.25     None
max        None      210     200     215     250     None
min        None       90      50      95     101     None
50%        None      200     150     140     215     None
std        None   59.554  62.915  49.749  70.971     None
unique        6     None    None    None    None        1