describe() Method

Teradata® Python Package User Guide

brand
Teradata Vantage
prodname
Teradata Python Package
vrm_release
16.20
category
User Guide
featnum
B700-4006-098K

Use the describe() function to generate statistics for numeric columns. The function computes the count, mean, std, min, percentiles, and max for numeric columns.

The function takes the following arguments:
  • percentiles: A list of values between 0 and 1 used for computing percentiles.

    The default value is [.25, .5, .75], which generates the 25th, 50th, and 75th percentiles.

  • include: The values for this argument can be either 'None' or 'all', used to specify if non-numeric columns are included in the computation.
    • If the value is 'all': Both numeric and non-numeric columns are included. The function computes count, mean, std, min, percentiles, and max for numeric columns, and computes count and unique for non-numeric columns.
    • If the value is 'None': Only numeric columns are used for collecting statics.
    The default value is 'None'.

Example Prerequisite

>>> df = DataFrame('sales')
>>> df
              Feb   Jan   Mar   Apr    datetime
accounts
Blue Inc     90.0    50    95   101  2017-04-01
Alpha Co    210.0   200   215   250  2017-04-01
Jones LLC   200.0   150   140   180  2017-04-01
Yellow Inc   90.0  None  None  None  2017-04-01
Red Inc     200.0   150   140  None  2017-04-01
Orange Inc  210.0  None  None   250  2017-04-01

Example: Generates statistics for DataFrame "sales"

Use default values to computes count, mean, std, min, percentiles, and max for numeric columns.

>>> df.describe()
          Apr      Feb     Mar     Jan
func
count       4        6       4       4
mean   195.25  166.667   147.5   137.5
std    70.971   59.554  49.749  62.915
min       101       90      95      50
25%    160.25    117.5  128.75     125
50%       215      200     140     150
75%       250    207.5  158.75   162.5
max       250      210     215     200

Example: Use argument percentiles to compute the 30th and 60th percentiles

>>> df.describe(percentiles=[.3, .6])
          Apr      Feb     Mar     Jan
func
count       4        6       4       4
mean   195.25  166.667   147.5   137.5
std    70.971   59.554  49.749  62.915
min       101       90      95      50
30%     172.1      145   135.5     140
60%       236      200     140     150
max       250      210     215     200

Example: Use groupby to compute statistics for specific groups

>>> df1 = df.groupby(["datetime", "Feb"])
>>> df1.describe()
                         Apr   Mar   Jan
  datetime   Feb func
2017-04-01  90.0  25%     101    95    50
                  50%     101    95    50
                  75%     101    95    50
                  count     1     1     1
                  max     101    95    50
                  mean    101    95    50
                  min     101    95    50
                  std    None  None  None
           200.0  25%     180   140   150
                  50%     180   140   150
                  75%     180   140   150
                  count     1     1     1
                  max     180   140   150
                  mean    180   140   150
                  min     180   140   150
                  std    None  None  None
           210.0  25%     250   215   200
                  50%     250   215   200
                  75%     250   215   200
                  count     2     1     1
                  max     250   215   200
                  mean    250   215   200
                  min     250   215   200
                  std       0  None  None
2018-10-15 200.0  25%    None   140   150
                  50%    None   140   150
                  75%    None   140   150
                  count     0     1     1
                  max    None   140   150
                  mean   None   140   150
                  min    None   140   150
                  std    None  None  None

Example: Use argument include value 'all' to compute statistics for all columns

Computes count, mean, std, min, percentiles, and max for numeric columns and computes count and unique for non-numeric columns.

>>> df.describe(include="all")
           Mar      Feb datetime     Jan accounts     Apr
func
count        4        6        6       4        6       4
unique    None     None        2    None        6    None
mean     147.5  166.667     None   137.5     None  195.25
std     49.749   59.554     None  62.915     None  70.971
min         95       90     None      50     None     101
25%     128.75    117.5     None     125     None  160.25
50%        140      200     None     150     None     215
75%     158.75    207.5     None   162.5     None     250
max        215      210     None     200     None     250