- Regular Aggregate Mode
It computes the count, mean, std, min, percentiles, and max for numeric columns.
Default statistics include: "count", "mean", "std", "min", "percentile", "max".
If describe() is used on the output of any DataFrame API or groupby(), then it is used in regular aggregate mode. - Time Series Aggregate Mode
It computes max, mean, min, std, median, mode, and percentiles for numeric columns.
Default statistics include: 'max', 'mean', 'min', 'std'
If describe() is used on the output of groupby_time(), then it is used in time series aggregate mode, where time series aggregates are used to calculate the statistics.
Examples Prerequisite
See Example Setup to set up the environment for the following examples.
Example: Get the basic statistics
Get the basic statistics for time series aggregation for all the numeric columns, use default settings. This example returns max, mean, min and std values.
>>> ocean_buoys_grpby.describe() temperature salinity TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid func ('2014-01-01 00:00:00.000000-00:00', '2016-01-0... 2 0 max 100 55 mean 54.75 55 min 10 55 std 51.674 0 1 max 79 55 mean 74.5 55 min 70 55 std 3.937 0 2 max 82 55 mean 81 55 min 80 55 std 1 0 44 max 56 55 mean 48.077 55 min 43 55 std 5.766 0
Example: Get the verbose statistics
Get the verbose statistics for time series aggregation for all the numeric columns, use default settings. This example returns max, mean, min, std, median, mode, 25th, 50th and 75th percentile.
>>> ocean_buoys_grpby.describe(verbose=True) temperature salinity TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid func ('2014-01-01 00:00:00.000000-00:00', '2016-01-0... 2 0 25% 10 55 50% 54.5 55 75% 99.25 55 max 100 55 mean 54.75 55 median 54.5 55 min 10 55 mode 10 55 std 51.674 0 1 25% 71.25 55 50% 74.5 55 75% 77.75 55 max 79 55 mean 74.5 55 median 74.5 55 min 70 55 mode 71 55 mode 72 55 mode 77 55 mode 78 55 mode 79 55 mode 70 55 std 3.937 0 2 25% 80.5 55 50% 81 55 75% 81.5 55 max 82 55 mean 81 55 median 81 55 min 80 55 mode 80 55 mode 81 55 mode 82 55 std 1 0 44 25% 43 55 50% 43 55 75% 53 55 max 56 55 mean 48.077 55 median 43 55 min 43 55 mode 43 55 std 5.766 0
Example: Get the basic statistics, consider only unique values
Get the basic statistics for time series aggregation for all the numeric columns, consider only unique values. This example returns max, mean, min and std values.
>>> ocean_buoys_grpby.describe(distinct=True) temperature salinity TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid func ('2014-01-01 00:00:00.000000-00:00', '2016-01-0... 2 0 max 100 55 mean 69.667 55 min 10 55 std 51.675 None 1 max 79 55 mean 74.5 55 min 70 55 std 3.937 None 2 max 82 55 mean 81 55 min 80 55 std 1 None 44 max 56 55 mean 52.2 55 min 43 55 std 5.263 None
Example: Get the verbose statistics, select non-default percentiles
Get the verbose statistics for time series aggregation for all the numeric columns. In this example, you select non-default percentiles 33rd and 66th. This example returns max, mean, min, std, median, mode, 33rd, and 66th percentile.
>>> ocean_buoys_grpby.describe(verbose=True, percentiles=[0.33, 0.66]) temperature salinity TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid func ('2014-01-01 00:00:00.000000-00:00', '2016-01-0... 2 0 33% 10 55 66% 97.22 55 max 100 55 mean 54.75 55 median 54.5 55 min 10 55 mode 10 55 std 51.674 0 1 33% 71.65 55 66% 77.3 55 max 79 55 mean 74.5 55 median 74.5 55 min 70 55 mode 70 55 mode 71 55 mode 77 55 mode 78 55 mode 79 55 mode 72 55 std 3.937 0 2 33% 80.66 55 66% 81.32 55 max 82 55 mean 81 55 median 81 55 min 80 55 mode 80 55 mode 81 55 mode 82 55 std 1 0 44 33% 43 55 66% 53 55 max 56 55 mean 48.077 55 median 43 55 min 43 55 mode 43 55 std 5.766 0