The top() function returns the largest number of values in the columns for each group, with or without ties. It is a single-threaded function
- This function is valid only on columns of numeric types.
- Nulls are not included in the result computation.
Examples Prerequisite
See Example Setup to set up the environment for the following examples.
Example: Run top() on DataFrame created on non-sequenced PTI table
Run top() to select 2 values for temperature in DataFrame created on non-sequenced PTI table.
>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="2cy", value_expression="buoyid", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_grpby1.top(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid top2temperature 0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 100 1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 99 2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 78 3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 79 4 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 82 5 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 81 6 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 55 7 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 56
Example: Run top() on DataFrame created on sequenced PTI table
Run top() to select 2 values for all columns in DataFrame created on sequenced PTI table.
>>> ocean_buoys_seq_grpby1 = ocean_buoys_seq.groupby_time(timebucket_duration="CAL_YEARS(2)", value_expression="buoyid", fill="NULLS")
>>> number_of_values_to_column = {2: "*"}
>>> ocean_buoys_seq_grpby1.top(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid top2TD_SEQNO top2salinity top2temperature 0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 26 55 99 1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 24 55 78 2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 15 55 81 3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 22 1 25 23 4 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 21 55 55
Example: Run top() on DataFrame created on non-PTI table
Run top() to select 2 values for temperature in DataFrame created on non-PTI table.
>>> ocean_buoys_nonpti_grpby1 = ocean_buoys_nonpti.groupby_time(timebucket_duration="2cyear", value_expression="buoyid", timecode_column="timecode", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_nonpti_grpby1.top(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid top2temperature 0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 23 0 99 1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 23 0 100 2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 23 1 79 3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 23 1 78 4 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 23 2 81 5 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 23 2 82 6 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 23 44 56 7 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 23 44 55
Example: Run top() with ties on DataFrame created on non-sequenced PTI table
Run top() with ties to select 2 values for temperature in DataFrame created on non-sequenced table.
>>> ocean_buoys_grpby2 = ocean_buoys.groupby_time(timebucket_duration="2m", value_expression="buoyid", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_grpby2.top(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(MINUTES(2)) buoyid top_with_ties2temperature 0 ('2014-01-06 08:00:00.000000+00:00', '2014-01-... 530161 0 10.0 1 ('2014-01-06 08:02:00.000000+00:00', '2014-01-... 530162 0 NaN 2 ('2014-01-06 08:04:00.000000+00:00', '2014-01-... 530163 0 NaN 3 ('2014-01-06 08:06:00.000000+00:00', '2014-01-... 530164 0 NaN 4 ('2014-01-06 08:08:00.000000+00:00', '2014-01-... 530165 0 99.0 5 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 530166 0 100.0 6 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 530166 0 10.0 7 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 530191 1 70.0 8 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 530191 1 77.0 9 ('2014-01-06 09:02:00.000000+00:00', '2014-01-... 530192 1 78.0
Example: Run top() with ties on DataFrame created on non-sequenced PTI table
Run top() with ties to select 2 values for temperature and 3 values for rest of the columns in DataFrame created on non-sequenced PTI table.
>>> ocean_buoys_grpby3 = ocean_buoys.groupby_time(timebucket_duration="MINUTES(2)", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature", 3:"*"}
>>> ocean_buoys_grpby3.top(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE"]) TIMECODE_RANGE GROUP BY TIME(MINUTES(2)) top_with_ties3buoyid top_with_ties3salinity top_with_ties2temperature 0 ('2014-01-06 08:00:00.000000+00:00', '2014-01-... 530161 0.0 55.0 10.0 1 ('2014-01-06 08:02:00.000000+00:00', '2014-01-... 530162 NaN NaN NaN 2 ('2014-01-06 08:04:00.000000+00:00', '2014-01-... 530163 NaN NaN NaN 3 ('2014-01-06 08:06:00.000000+00:00', '2014-01-... 530164 NaN NaN NaN 4 ('2014-01-06 08:08:00.000000+00:00', '2014-01-... 530165 0.0 55.0 99.0 5 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 530166 0.0 55.0 10.0 6 ('2014-01-06 08:12:00.000000+00:00', '2014-01-... 530167 NaN NaN NaN 7 ('2014-01-06 08:14:00.000000+00:00', '2014-01-... 530168 NaN NaN NaN 8 ('2014-01-06 08:16:00.000000+00:00', '2014-01-... 530169 NaN NaN NaN 9 ('2014-01-06 08:18:00.000000+00:00', '2014-01-... 530170 NaN NaN NaN