The bottom() function returns the smallest number of values in the columns for each group, with or without ties.
- This function is valid only on columns with numeric types.
- Nulls are not included in the result computation.
Examples Prerequisite
See Example Setup to set up the environment for the following examples.
Example: Run bottom() on DataFrame created on non-sequenced PTI table
>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="MINUTES(2)", value_expression="buoyid", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_grpby1.bottom(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(MINUTES(2)) buoyid bottom2temperature 0 ('2014-01-06 08:00:00.000000+00:00', '2014-01-... 530161 0 10.0 1 ('2014-01-06 08:02:00.000000+00:00', '2014-01-... 530162 0 NaN 2 ('2014-01-06 08:04:00.000000+00:00', '2014-01-... 530163 0 NaN 3 ('2014-01-06 08:06:00.000000+00:00', '2014-01-... 530164 0 NaN 4 ('2014-01-06 08:08:00.000000+00:00', '2014-01-... 530165 0 99.0 5 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 530166 0 100.0 6 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 530166 0 10.0 7 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 530191 1 70.0 8 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 530191 1 77.0 9 ('2014-01-06 09:02:00.000000+00:00', '2014-01-... 530192 1 71.0
Example: Run bottom() on DataFrame created on sequenced PTI table
>>> ocean_buoys_seq_grpby1 = ocean_buoys_seq.groupby_time(timebucket_duration="MINUTES(2)", value_expression="buoyid", fill="NULLS")
>>> number_of_values_to_column = {2: "*"}
>>> ocean_buoys_seq_grpby1.bottom(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(MINUTES(2)) buoyid bottom2TD_SEQNO bottom2salinity bottom2temperature 0 ('2014-01-06 08:00:00.000000+00:00', '2014-01-... 530161 0 26.0 55.0 10.0 1 ('2014-01-06 08:02:00.000000+00:00', '2014-01-... 530162 0 NaN NaN NaN 2 ('2014-01-06 08:04:00.000000+00:00', '2014-01-... 530163 0 NaN NaN NaN 3 ('2014-01-06 08:06:00.000000+00:00', '2014-01-... 530164 0 NaN NaN NaN 4 ('2014-01-06 08:08:00.000000+00:00', '2014-01-... 530165 0 17.0 55.0 99.0 5 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 530166 0 19.0 55.0 10.0 6 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 530191 1 11.0 55.0 70.0 7 ('2014-01-06 09:02:00.000000+00:00', '2014-01-... 530192 1 12.0 55.0 71.0 8 ('2014-01-06 10:00:00.000000+00:00', '2014-01-... 530221 44 4.0 55.0 43.0 9 ('2014-01-06 10:02:00.000000+00:00', '2014-01-... 530222 44 9.0 55.0 53.0
Example: Run bottom() on DataFrame created on non-PTI table
>> ocean_buoys_nonpti_grpby1 = ocean_buoys_nonpti.groupby_time(timebucket_duration="MINUTES(2)", value_expression="buoyid", timecode_column="timecode", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_nonpti_grpby1.bottom(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(MINUTES(2)) buoyid bottom2temperature 0 ('2014-01-06 08:00:00.000000+00:00', '2014-01-... 11574961 0 10.0 1 ('2014-01-06 08:02:00.000000+00:00', '2014-01-... 11574962 0 NaN 2 ('2014-01-06 08:04:00.000000+00:00', '2014-01-... 11574963 0 NaN 3 ('2014-01-06 08:06:00.000000+00:00', '2014-01-... 11574964 0 NaN 4 ('2014-01-06 08:08:00.000000+00:00', '2014-01-... 11574965 0 99.0 5 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 11574966 0 100.0 6 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 11574966 0 10.0 7 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 11574991 1 70.0 8 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 11574991 1 77.0 9 ('2014-01-06 09:02:00.000000+00:00', '2014-01-... 11574992 1 71.0
Example: Run bottom() with ties on DataFrame created on non-sequenced PTI table
>>> ocean_buoys_grpby2 = ocean_buoys.groupby_time(timebucket_duration="MINUTES(2)", value_expression="buoyid", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_grpby2.bottom(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(MINUTES(2)) buoyid bottom_with_ties2temperature 0 ('2014-01-06 08:00:00.000000+00:00', '2014-01-... 530161 0 10.0 1 ('2014-01-06 08:02:00.000000+00:00', '2014-01-... 530162 0 NaN 2 ('2014-01-06 08:04:00.000000+00:00', '2014-01-... 530163 0 NaN 3 ('2014-01-06 08:06:00.000000+00:00', '2014-01-... 530164 0 NaN 4 ('2014-01-06 08:08:00.000000+00:00', '2014-01-... 530165 0 99.0 5 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 530166 0 100.0 6 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 530166 0 10.0 7 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 530191 1 77.0 8 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 530191 1 70.0 9 ('2014-01-06 09:02:00.000000+00:00', '2014-01-... 530192 1 71.0
Example: Run buttom() with ties to select specific number of values of the columns
Run bottom() with ties to select two values for 'temperature' column and three for rest of the columns in ocean_buoys DataFrame.
>>> ocean_buoys_grpby3 = ocean_buoys.groupby_time(timebucket_duration="MINUTES(2)", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature", 3:"*"}
>>> ocean_buoys_grpby3.bottom(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE"]) TIMECODE_RANGE GROUP BY TIME(MINUTES(2)) bottom_with_ties3buoyid bottom_with_ties3salinity bottom_with_ties2temperature 0 ('2014-01-06 08:00:00.000000+00:00', '2014-01-... 530161 0.0 55.0 10.0 1 ('2014-01-06 08:02:00.000000+00:00', '2014-01-... 530162 NaN NaN NaN 2 ('2014-01-06 08:04:00.000000+00:00', '2014-01-... 530163 NaN NaN NaN 3 ('2014-01-06 08:06:00.000000+00:00', '2014-01-... 530164 NaN NaN NaN 4 ('2014-01-06 08:08:00.000000+00:00', '2014-01-... 530165 0.0 55.0 99.0 5 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 530166 0.0 55.0 10.0 6 ('2014-01-06 08:12:00.000000+00:00', '2014-01-... 530167 NaN NaN NaN 7 ('2014-01-06 08:14:00.000000+00:00', '2014-01-... 530168 NaN NaN NaN 8 ('2014-01-06 08:16:00.000000+00:00', '2014-01-... 530169 NaN NaN NaN 9 ('2014-01-06 08:18:00.000000+00:00', '2014-01-... 530170 NaN NaN NaN