The describe() function generates statistics for numeric columns. This function can be used in two modes:
- Regular Aggregate Mode
It computes the count, mean, std, min, percentiles, and max for numeric columns.
Default statistics include: "count", "mean", "std", "min", "percentile", "max".
If describe() is used on the output of any DataFrame API or groupby(), then it is used in regular aggregate mode. - Time Series Aggregate Mode
It computes max, mean, min, std, median, mode, and percentiles for numeric columns.
Default statistics include: 'max', 'mean', 'min', 'std'
If describe() is used on the output of groupby_time(), then it is used in time series aggregate mode, where time series aggregates are used to calculate the statistics.
Examples here are for describe() as regular function or aggregate function. For describe() as Time Series Aggregate, refer to describe() in Time Series Aggregate Mode.
Example Prerequisite
>>> df = DataFrame('sales')
>>> df Feb Jan Mar Apr datetime accounts Alpha Co 210.0 200 215 250 04/01/2017 Red Inc 200.0 150 140 None 04/01/2017 Orange Inc 210.0 None None 250 04/01/2017 Jones LLC 200.0 150 140 180 04/01/2017 Yellow Inc 90.0 None None None 04/01/2017 Blue Inc 90.0 50 95 101 04/01/2017
Example: Generates statistics for DataFrame "sales"
Use default values to computes count, mean, std, min, percentiles, and max for numeric columns.
>>> df.describe() Apr Feb Mar Jan func count 4 6 4 4 mean 195.25 166.667 147.5 137.5 std 70.971 59.554 49.749 62.915 min 101 90 95 50 25% 160.25 117.5 128.75 125 50% 215 200 140 150 75% 250 207.5 158.75 162.5 max 250 210 215 200
Example: Use argument percentiles to compute the 30th and 60th percentiles
>>> df.describe(percentiles=[.3, .6]) Apr Feb Mar Jan func count 4 6 4 4 mean 195.25 166.667 147.5 137.5 std 70.971 59.554 49.749 62.915 min 101 90 95 50 30% 172.1 145 135.5 140 60% 236 200 140 150 max 250 210 215 200
Example: Use groupby to compute statistics for specific groups
>>> df1 = df.groupby(["datetime", "Feb"])
>>> df1.describe() Jan Mar Apr datetime Feb func 04/01/2017 90.0 25% 50 95 101 50% 50 95 101 75% 50 95 101 count 1 1 1 max 50 95 101 mean 50 95 101 min 50 95 101 std None None None 200.0 25% 150 140 180 50% 150 140 180 75% 150 140 180 count 2 2 1 max 150 140 180 mean 150 140 180 min 150 140 180 std 0 0 None 210.0 25% 200 215 250 50% 200 215 250 75% 200 215 250 count 1 1 2 max 200 215 250 mean 200 215 250 min 200 215 250 std None None 0
Example: Use argument include value 'all' to compute statistics for all columns
Computes count, mean, std, min, percentiles, and max for numeric columns and computes count and unique for non-numeric columns.
>>> df.describe(include="all") accounts Feb Jan Mar Apr datetime func 25% None 117.5 125 128.75 160.25 None 75% None 207.5 162.5 158.75 250 None count 6 6 4 4 4 6 mean None 166.667 137.5 147.5 195.25 None max None 210 200 215 250 None min None 90 50 95 101 None 50% None 200 150 140 215 None std None 59.554 62.915 49.749 70.971 None unique 6 None None None None 1