Teradata Package for Python Function Reference | 20.00 - top - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- lifecycle
- latest
- Product Category
- Teradata Vantage
- teradataml.dataframe.dataframe.DataFrameGroupByTime.top = top(self, number_of_values_to_column, with_ties=False)
- DESCRIPTION:
Returns the largest number of values in the columns for each group, with or without ties.
TOP is a single-threaded function.
Note:
1. This function is valid only on columns with numeric types.
2. Null values are not included in the result computation.
PARAMETERS:
number_of_values_to_column:
Required Argument.
Specifies a dictionary that accepts number of values to be selected for each column.
Number of values is a key in the dictionary. Key should be any positive integer.
Whereas value in the dictionary can be a column name or list of column names.
Sometimes, value can also include a special character '*', instead of column name.
This should be used only when one wants to return same number of values for all columns.
Types: Dictionary
Examples:
# Let's assume, a teradataml DataFrame has following columns:
# col1, col2, col3, ..., colN
# For top() to return 2 values for column "col1":
number_of_values_to_column = {2: "col1"}
# For top() to return 2 values for column "col1" and 5 values for "col3":
number_of_values_to_column = {2: "col1", 5: "col3"}
# For top() to return 2 values for column "col1", "col2" and "col3":
number_of_values_to_column = {2: ["col1", "col2", "col3"]}
# Use cases for using '*' default value.
# For top() to return 2 values for all columns. In case, we need to return 2 values
# for each column in the DataFrame, then one can use '*'.
number_of_values_to_column = {2: "*"}
# For top() to return 2 values for column "col1" and "col3"
# and 5 values for rest of the columns:
number_of_values_to_column = {2: ["col1", "col3"], 5: "*"}
# We can use default value column character ('*') in list as well
# For top() to return 2 values for column "col1" and "col3"
# and 5 values for "col4" and rest of the columns:
number_of_values_to_column = {2: ["col1", "col3"], 5: ["col4", "*"]}
with_ties:
Optional Argument.
Specifies a flag to decide whether to run top function with ties or not.
TOP WITH TIES implies that the rows returned include the specified number of rows in
the ordered set for each timebucket. It includes any rows where the sort key value
is the same as the sort key value in the last row that satisfies the specified number
or percentage of rows. If this clause is omitted and ties are found, the earliest
value in terms of timecode is returned.
Types: bool
RETURNS:
teradataml DataFrame
RAISES:
TypeError - If incorrect type of values passed to input argument.
ValueError - If invalid value passed to the the argument.
TeradataMLException
1. If required argument 'number_of_values_to_column' is missing or None is passed.
2. TDMLDF_AGGREGATE_FAILED - If top() operation fails to
generate the column-wise largest number of values in the columns.
EXAMPLES :
>>> # Load the example datasets.
... load_example_data("dataframe", ["ocean_buoys", "ocean_buoys_seq", "ocean_buoys_nonpti"])
>>>
>>> # Create the required DataFrames.
... # DataFrame on non-sequenced PTI table
... ocean_buoys = DataFrame("ocean_buoys")
>>> # Check DataFrame columns and let's peek at the data
... ocean_buoys.columns
['buoyid', 'TD_TIMECODE', 'temperature', 'salinity']
>>> ocean_buoys.head()
TD_TIMECODE temperature salinity
buoyid
0 2014-01-06 08:10:00.000000 100.0 55
0 2014-01-06 08:08:59.999999 NaN 55
1 2014-01-06 09:01:25.122200 77.0 55
1 2014-01-06 09:03:25.122200 79.0 55
1 2014-01-06 09:01:25.122200 70.0 55
1 2014-01-06 09:02:25.122200 71.0 55
1 2014-01-06 09:03:25.122200 72.0 55
0 2014-01-06 08:09:59.999999 99.0 55
0 2014-01-06 08:00:00.000000 10.0 55
0 2014-01-06 08:10:00.000000 10.0 55
>>> # DataFrame on sequenced PTI table
... ocean_buoys_seq = DataFrame("ocean_buoys_seq")
>>> # Check DataFrame columns and let's peek at the data
... ocean_buoys_seq.columns
['TD_TIMECODE', 'TD_SEQNO', 'buoyid', 'salinity', 'temperature', 'dates']
>>> ocean_buoys_seq.head()
TD_TIMECODE TD_SEQNO salinity temperature dates
buoyid
0 2014-01-06 08:00:00.000000 26 55 10.0 2016-02-26
0 2014-01-06 08:08:59.999999 18 55 NaN 2015-06-18
1 2014-01-06 09:02:25.122200 24 55 78.0 2015-12-24
1 2014-01-06 09:01:25.122200 23 55 77.0 2015-11-23
1 2014-01-06 09:02:25.122200 12 55 71.0 2014-12-12
1 2014-01-06 09:03:25.122200 13 55 72.0 2015-01-13
1 2014-01-06 09:01:25.122200 11 55 70.0 2014-11-11
0 2014-01-06 08:10:00.000000 19 55 10.0 2015-07-19
0 2014-01-06 08:09:59.999999 17 55 99.0 2015-05-17
0 2014-01-06 08:10:00.000000 27 55 100.0 2016-03-27
>>> # DataFrame on NON-PTI table
... ocean_buoys_nonpti = DataFrame("ocean_buoys_nonpti")
>>> # Check DataFrame columns and let's peek at the data
... ocean_buoys_nonpti.columns
['buoyid', 'timecode', 'temperature', 'salinity']
>>> ocean_buoys_nonpti.head()
buoyid temperature salinity
timecode
2014-01-06 08:09:59.999999 0 99.0 55
2014-01-06 08:10:00.000000 0 10.0 55
2014-01-06 09:01:25.122200 1 70.0 55
2014-01-06 09:01:25.122200 1 77.0 55
2014-01-06 09:02:25.122200 1 71.0 55
2014-01-06 09:03:25.122200 1 72.0 55
2014-01-06 09:02:25.122200 1 78.0 55
2014-01-06 08:10:00.000000 0 100.0 55
2014-01-06 08:08:59.999999 0 NaN 55
2014-01-06 08:00:00.000000 0 10.0 55
### Examples for top without ties ###
#
# Example 1: Executing top function on DataFrame created on non-sequenced PTI table.
#
>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="2cy",
... value_expression="buoyid", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_grpby1.top(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid top2temperature
0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 100
1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 99
2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 78
3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 79
4 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 82
5 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 81
6 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 55
7 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 56
>>>
#
# Example 2: Executing top to select 2 values for all the columns in ocean_buoys_seq DataFrame
# on sequenced PTI table.
#
>>> ocean_buoys_seq_grpby1 = ocean_buoys_seq.groupby_time(timebucket_duration="CAL_YEARS(2)",
... value_expression="buoyid", fill="NULLS")
>>> number_of_values_to_column = {2: "*"}
>>> ocean_buoys_seq_grpby1.top(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid top2TD_SEQNO top2salinity top2temperature
0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 26 55 99
1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 24 55 78
2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 15 55 81
3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 22 1 25 23
4 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 21 55 55
#
# Example 3: Executing top function on DataFrame created on NON-PTI table.
#
>>> ocean_buoys_nonpti_grpby1 = ocean_buoys_nonpti.groupby_time(timebucket_duration="2cyear",
... value_expression="buoyid",
... timecode_column="timecode", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_nonpti_grpby1.top(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid top2temperature
0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 23 0 99
1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 23 0 100
2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 23 1 79
3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 23 1 78
4 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 23 2 81
5 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 23 2 82
6 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 23 44 56
7 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 23 44 55
### Examples for top with ties ###
#
# Example 4: Executing top with ties function on DataFrame created on non-sequenced PTI table.
#
>>> ocean_buoys_grpby2 = ocean_buoys.groupby_time(timebucket_duration="2m",
... value_expression="buoyid", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_grpby2.top(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(MINUTES(2)) buoyid top_with_ties2temperature
0 ('2014-01-06 08:00:00.000000+00:00', '2014-01-... 530161 0 10.0
1 ('2014-01-06 08:02:00.000000+00:00', '2014-01-... 530162 0 NaN
2 ('2014-01-06 08:04:00.000000+00:00', '2014-01-... 530163 0 NaN
3 ('2014-01-06 08:06:00.000000+00:00', '2014-01-... 530164 0 NaN
4 ('2014-01-06 08:08:00.000000+00:00', '2014-01-... 530165 0 99.0
5 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 530166 0 100.0
6 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 530166 0 10.0
7 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 530191 1 70.0
8 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 530191 1 77.0
9 ('2014-01-06 09:02:00.000000+00:00', '2014-01-... 530192 1 78.0
#
# Example 5: Executing top with ties to select 2 values for temperature and 3 for rest of the columns in
# ocean_buoys DataFrame.
#
>>> ocean_buoys_grpby3 = ocean_buoys.groupby_time(timebucket_duration="MINUTES(2)", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature", 3:"*"}
>>> ocean_buoys_grpby3.top(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE"])
TIMECODE_RANGE GROUP BY TIME(MINUTES(2)) top_with_ties3buoyid top_with_ties3salinity top_with_ties2temperature
0 ('2014-01-06 08:00:00.000000+00:00', '2014-01-... 530161 0.0 55.0 10.0
1 ('2014-01-06 08:02:00.000000+00:00', '2014-01-... 530162 NaN NaN NaN
2 ('2014-01-06 08:04:00.000000+00:00', '2014-01-... 530163 NaN NaN NaN
3 ('2014-01-06 08:06:00.000000+00:00', '2014-01-... 530164 NaN NaN NaN
4 ('2014-01-06 08:08:00.000000+00:00', '2014-01-... 530165 0.0 55.0 99.0
5 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 530166 0.0 55.0 10.0
6 ('2014-01-06 08:12:00.000000+00:00', '2014-01-... 530167 NaN NaN NaN
7 ('2014-01-06 08:14:00.000000+00:00', '2014-01-... 530168 NaN NaN NaN
8 ('2014-01-06 08:16:00.000000+00:00', '2014-01-... 530169 NaN NaN NaN
9 ('2014-01-06 08:18:00.000000+00:00', '2014-01-... 530170 NaN NaN NaN
>>>
#
# Example 6: Executing top with ties function on DataFrame created on NON-PTI table.
#
>>> ocean_buoys_nonpti_grpby2 = ocean_buoys_nonpti.groupby_time(timebucket_duration="2mins",
... value_expression="buoyid",
... timecode_column="timecode", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_nonpti_grpby2.top(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(MINUTES(2)) buoyid top_with_ties2temperature
0 ('2014-01-06 08:00:00.000000+00:00', '2014-01-... 11574961 0 10.0
1 ('2014-01-06 08:02:00.000000+00:00', '2014-01-... 11574962 0 NaN
2 ('2014-01-06 08:04:00.000000+00:00', '2014-01-... 11574963 0 NaN
3 ('2014-01-06 08:06:00.000000+00:00', '2014-01-... 11574964 0 NaN
4 ('2014-01-06 08:08:00.000000+00:00', '2014-01-... 11574965 0 99.0
5 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 11574966 0 100.0
6 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 11574966 0 10.0
7 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 11574991 1 70.0
8 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 11574991 1 77.0
9 ('2014-01-06 09:02:00.000000+00:00', '2014-01-... 11574992 1 79.0
>>>
>>>