top() - Teradata Python Package

The top() function returns the largest number of values in the columns for each group, with or without ties. It is a single-threaded function

This function is valid only on columns of numeric types.
Nulls are not included in the result computation.

Examples Prerequisite

See Example Setup to set up the environment for the following examples.

Example: Run top() on DataFrame created on non-sequenced PTI table

Run top() to select 2 values for temperature in DataFrame created on non-sequenced PTI table.

>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="2cy", value_expression="buoyid", fill="NULLS")

>>> number_of_values_to_column = {2: "temperature"}

>>> ocean_buoys_grpby1.top(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"])
                                      TIMECODE_RANGE  GROUP BY TIME(CAL_YEARS(2))  buoyid  top2temperature
0  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       0              100
1  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       0               99
2  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       1               78
3  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       1               79
4  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       2               82
5  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       2               81
6  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2      44               55
7  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2      44               56

Example: Run top() on DataFrame created on sequenced PTI table

Run top() to select 2 values for all columns in DataFrame created on sequenced PTI table.

>>> ocean_buoys_seq_grpby1 = ocean_buoys_seq.groupby_time(timebucket_duration="CAL_YEARS(2)", value_expression="buoyid", fill="NULLS")

>>> number_of_values_to_column = {2: "*"}

>>> ocean_buoys_seq_grpby1.top(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"])
                                        TIMECODE_RANGE  GROUP BY TIME(CAL_YEARS(2))  buoyid  top2TD_SEQNO  top2salinity  top2temperature
0  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       0            26            55               99
1  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       1            24            55               78
2  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       2            15            55               81
3  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2      22             1            25               23
4  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2      44            21            55               55

Example: Run top() on DataFrame created on non-PTI table

Run top() to select 2 values for temperature in DataFrame created on non-PTI table.

>>> ocean_buoys_nonpti_grpby1 = ocean_buoys_nonpti.groupby_time(timebucket_duration="2cyear", value_expression="buoyid", timecode_column="timecode", fill="NULLS")

>>> number_of_values_to_column = {2: "temperature"}

>>> ocean_buoys_nonpti_grpby1.top(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"])
                                      TIMECODE_RANGE  GROUP BY TIME(CAL_YEARS(2))  buoyid  top2temperature
0  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                           23       0               99
1  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                           23       0              100
2  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                           23       1               79
3  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                           23       1               78
4  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                           23       2               81
5  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                           23       2               82
6  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                           23      44               56
7  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                           23      44               55

Example: Run top() with ties on DataFrame created on non-sequenced PTI table

Run top() with ties to select 2 values for temperature in DataFrame created on non-sequenced table.

>>> ocean_buoys_grpby2 = ocean_buoys.groupby_time(timebucket_duration="2m", value_expression="buoyid", fill="NULLS")

>>> number_of_values_to_column = {2: "temperature"}

>>> ocean_buoys_grpby2.top(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE", "buoyid"])
                                        TIMECODE_RANGE  GROUP BY TIME(MINUTES(2))  buoyid  top_with_ties2temperature
0  ('2014-01-06 08:00:00.000000+00:00', '2014-01-...                     530161       0                       10.0
1  ('2014-01-06 08:02:00.000000+00:00', '2014-01-...                     530162       0                        NaN
2  ('2014-01-06 08:04:00.000000+00:00', '2014-01-...                     530163       0                        NaN
3  ('2014-01-06 08:06:00.000000+00:00', '2014-01-...                     530164       0                        NaN
4  ('2014-01-06 08:08:00.000000+00:00', '2014-01-...                     530165       0                       99.0
5  ('2014-01-06 08:10:00.000000+00:00', '2014-01-...                     530166       0                      100.0
6  ('2014-01-06 08:10:00.000000+00:00', '2014-01-...                     530166       0                       10.0
7  ('2014-01-06 09:00:00.000000+00:00', '2014-01-...                     530191       1                       70.0
8  ('2014-01-06 09:00:00.000000+00:00', '2014-01-...                     530191       1                       77.0
9  ('2014-01-06 09:02:00.000000+00:00', '2014-01-...                     530192       1                       78.0

Example: Run top() with ties on DataFrame created on non-sequenced PTI table

Run top() with ties to select 2 values for temperature and 3 values for rest of the columns in DataFrame created on non-sequenced PTI table.

>>> ocean_buoys_grpby3 = ocean_buoys.groupby_time(timebucket_duration="MINUTES(2)", fill="NULLS")

>>> number_of_values_to_column = {2: "temperature", 3:"*"}

>>> ocean_buoys_grpby3.top(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE"])
                                      TIMECODE_RANGE  GROUP BY TIME(MINUTES(2))  top_with_ties3buoyid  top_with_ties3salinity  top_with_ties2temperature
0  ('2014-01-06 08:00:00.000000+00:00', '2014-01-...                     530161                   0.0                    55.0                       10.0
1  ('2014-01-06 08:02:00.000000+00:00', '2014-01-...                     530162                   NaN                     NaN                        NaN
2  ('2014-01-06 08:04:00.000000+00:00', '2014-01-...                     530163                   NaN                     NaN                        NaN
3  ('2014-01-06 08:06:00.000000+00:00', '2014-01-...                     530164                   NaN                     NaN                        NaN
4  ('2014-01-06 08:08:00.000000+00:00', '2014-01-...                     530165                   0.0                    55.0                       99.0
5  ('2014-01-06 08:10:00.000000+00:00', '2014-01-...                     530166                   0.0                    55.0                       10.0
6  ('2014-01-06 08:12:00.000000+00:00', '2014-01-...                     530167                   NaN                     NaN                        NaN
7  ('2014-01-06 08:14:00.000000+00:00', '2014-01-...                     530168                   NaN                     NaN                        NaN
8  ('2014-01-06 08:16:00.000000+00:00', '2014-01-...                     530169                   NaN                     NaN                        NaN
9  ('2014-01-06 08:18:00.000000+00:00', '2014-01-...                     530170                   NaN                     NaN                        NaN