top() - Teradata Python Package

Teradata® Python Package User Guide

Product
Teradata Python Package
Release Number
16.20
Published
February 2020
Language
English (United States)
Last Update
2020-02-29
dita:mapPath
rkb1531260709148.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage

The top() function returns the largest number of values in the columns for each group, with or without ties. It is a single-threaded function

  • This function is valid only on columns of numeric types.
  • Nulls are not included in the result computation.

Examples Prerequisite

See Example Setup to set up the environment for the following examples.

Example: Run top() on DataFrame created on non-sequenced PTI table

Run top() to select 2 values for temperature in DataFrame created on non-sequenced PTI table.

>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="2cy", value_expression="buoyid", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_grpby1.top(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"])
                                      TIMECODE_RANGE  GROUP BY TIME(CAL_YEARS(2))  buoyid  top2temperature
0  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       0              100
1  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       0               99
2  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       1               78
3  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       1               79
4  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       2               82
5  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       2               81
6  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2      44               55
7  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2      44               56

Example: Run top() on DataFrame created on sequenced PTI table

Run top() to select 2 values for all columns in DataFrame created on sequenced PTI table.

>>> ocean_buoys_seq_grpby1 = ocean_buoys_seq.groupby_time(timebucket_duration="CAL_YEARS(2)", value_expression="buoyid", fill="NULLS")
>>> number_of_values_to_column = {2: "*"}
>>> ocean_buoys_seq_grpby1.top(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"])
                                        TIMECODE_RANGE  GROUP BY TIME(CAL_YEARS(2))  buoyid  top2TD_SEQNO  top2salinity  top2temperature
0  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       0            26            55               99
1  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       1            24            55               78
2  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       2            15            55               81
3  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2      22             1            25               23
4  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2      44            21            55               55

Example: Run top() on DataFrame created on non-PTI table

Run top() to select 2 values for temperature in DataFrame created on non-PTI table.

>>> ocean_buoys_nonpti_grpby1 = ocean_buoys_nonpti.groupby_time(timebucket_duration="2cyear", value_expression="buoyid", timecode_column="timecode", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_nonpti_grpby1.top(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"])
                                      TIMECODE_RANGE  GROUP BY TIME(CAL_YEARS(2))  buoyid  top2temperature
0  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                           23       0               99
1  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                           23       0              100
2  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                           23       1               79
3  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                           23       1               78
4  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                           23       2               81
5  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                           23       2               82
6  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                           23      44               56
7  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                           23      44               55

Example: Run top() with ties on DataFrame created on non-sequenced PTI table

Run top() with ties to select 2 values for temperature in DataFrame created on non-sequenced table.

>>> ocean_buoys_grpby2 = ocean_buoys.groupby_time(timebucket_duration="2m", value_expression="buoyid", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_grpby2.top(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE", "buoyid"])
                                        TIMECODE_RANGE  GROUP BY TIME(MINUTES(2))  buoyid  top_with_ties2temperature
0  ('2014-01-06 08:00:00.000000+00:00', '2014-01-...                     530161       0                       10.0
1  ('2014-01-06 08:02:00.000000+00:00', '2014-01-...                     530162       0                        NaN
2  ('2014-01-06 08:04:00.000000+00:00', '2014-01-...                     530163       0                        NaN
3  ('2014-01-06 08:06:00.000000+00:00', '2014-01-...                     530164       0                        NaN
4  ('2014-01-06 08:08:00.000000+00:00', '2014-01-...                     530165       0                       99.0
5  ('2014-01-06 08:10:00.000000+00:00', '2014-01-...                     530166       0                      100.0
6  ('2014-01-06 08:10:00.000000+00:00', '2014-01-...                     530166       0                       10.0
7  ('2014-01-06 09:00:00.000000+00:00', '2014-01-...                     530191       1                       70.0
8  ('2014-01-06 09:00:00.000000+00:00', '2014-01-...                     530191       1                       77.0
9  ('2014-01-06 09:02:00.000000+00:00', '2014-01-...                     530192       1                       78.0

Example: Run top() with ties on DataFrame created on non-sequenced PTI table

Run top() with ties to select 2 values for temperature and 3 values for rest of the columns in DataFrame created on non-sequenced PTI table.

>>> ocean_buoys_grpby3 = ocean_buoys.groupby_time(timebucket_duration="MINUTES(2)", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature", 3:"*"}
>>> ocean_buoys_grpby3.top(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE"])
                                      TIMECODE_RANGE  GROUP BY TIME(MINUTES(2))  top_with_ties3buoyid  top_with_ties3salinity  top_with_ties2temperature
0  ('2014-01-06 08:00:00.000000+00:00', '2014-01-...                     530161                   0.0                    55.0                       10.0
1  ('2014-01-06 08:02:00.000000+00:00', '2014-01-...                     530162                   NaN                     NaN                        NaN
2  ('2014-01-06 08:04:00.000000+00:00', '2014-01-...                     530163                   NaN                     NaN                        NaN
3  ('2014-01-06 08:06:00.000000+00:00', '2014-01-...                     530164                   NaN                     NaN                        NaN
4  ('2014-01-06 08:08:00.000000+00:00', '2014-01-...                     530165                   0.0                    55.0                       99.0
5  ('2014-01-06 08:10:00.000000+00:00', '2014-01-...                     530166                   0.0                    55.0                       10.0
6  ('2014-01-06 08:12:00.000000+00:00', '2014-01-...                     530167                   NaN                     NaN                        NaN
7  ('2014-01-06 08:14:00.000000+00:00', '2014-01-...                     530168                   NaN                     NaN                        NaN
8  ('2014-01-06 08:16:00.000000+00:00', '2014-01-...                     530169                   NaN                     NaN                        NaN
9  ('2014-01-06 08:18:00.000000+00:00', '2014-01-...                     530170                   NaN                     NaN                        NaN