Teradata Package for Python Function Reference | 20.00 - bottom - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- lifecycle
- latest
- Product Category
- Teradata Vantage
- teradataml.dataframe.dataframe.DataFrameGroupByTime.bottom = bottom(self, number_of_values_to_column, with_ties=False)
- DESCRIPTION:
Returns the smallest number of values in the columns for each group, with or without ties.
Note:
1. This function is valid only on columns with numeric types.
2. Null values are not included in the result computation.
PARAMETERS:
number_of_values_to_column:
Required Argument.
Specifies a dictionary that accepts number of values to be selected for each column.
Number of values is a key in the dictionary. Key should be any positive integer.
Whereas value in the dictionary can be a column name or list of column names.
Sometimes, value can also include a special character '*', instead of column name.
This should be used only when one wants to return same number of values for all columns.
Types: Dictionary
Examples:
# Let's assume, a teradataml DataFrame has following columns:
# col1, col2, col3, ..., colN
# For bottom() to return 2 values for column "col1":
number_of_values_to_column = {2: "col1"}
# For bottom() to return 2 values for column "col1" and 5 values for "col3":
number_of_values_to_column = {2: "col1", 5: "col3"}
# For bottom() to return 2 values for column "col1", "col2" and "col3":
number_of_values_to_column = {2: ["col1", "col2", "col3"]}
# Use cases for using '*' default value.
# For bottom() to return 2 values for all columns. In case, we need to return 2 values
# for each column in the DataFrame, then one can use '*'.
number_of_values_to_column = {2: "*"}
# For bottom() to return 2 values for column "col1" and "col3"
# and 5 values for rest of the columns:
number_of_values_to_column = {2: ["col1", "col3"], 5: "*"}
# We can use default value column character ('*') in list as well
# For bottom() to return 2 values for column "col1" and "col3"
# and 5 values for "col4" and rest of the columns:
number_of_values_to_column = {2: ["col1", "col3"], 5: ["col4", "*"]}
with_ties:
Optional Argument.
Specifies a flag to decide whether to run bottom function with ties or not.
BOTTOM WITH TIES implies that the rows returned include the specified number of rows in
the ordered set for each timebucket. It includes any rows where the sort key value
is the same as the sort key value in the last row that satisfies the specified number
or percentage of rows. If this clause is omitted and ties are found, the earliest
value in terms of timecode is returned.
Types: bool
RETURNS:
teradataml DataFrame
RAISES:
TypeError - If incorrect type of values passed to input argument.
ValueError - If invalid value passed to the the argument.
TeradataMLException
1. If required argument 'number_of_values_to_column' is missing or None is passed.
2. TDMLDF_AGGREGATE_FAILED - If bottom() operation fails to
generate the column-wise smallest number of values for the columns.
EXAMPLES :
>>> # Load the example datasets.
... load_example_data("dataframe", ["ocean_buoys", "ocean_buoys_seq", "ocean_buoys_nonpti"])
>>>
>>> # Create the required DataFrames.
... # DataFrame on non-sequenced PTI table
... ocean_buoys = DataFrame("ocean_buoys")
>>> # Check DataFrame columns and let's peek at the data
... ocean_buoys.columns
['buoyid', 'TD_TIMECODE', 'temperature', 'salinity']
>>> ocean_buoys.head()
TD_TIMECODE temperature salinity
buoyid
0 2014-01-06 08:10:00.000000 100.0 55
0 2014-01-06 08:08:59.999999 NaN 55
1 2014-01-06 09:01:25.122200 77.0 55
1 2014-01-06 09:03:25.122200 79.0 55
1 2014-01-06 09:01:25.122200 70.0 55
1 2014-01-06 09:02:25.122200 71.0 55
1 2014-01-06 09:03:25.122200 72.0 55
0 2014-01-06 08:09:59.999999 99.0 55
0 2014-01-06 08:00:00.000000 10.0 55
0 2014-01-06 08:10:00.000000 10.0 55
>>> # DataFrame on sequenced PTI table
... ocean_buoys_seq = DataFrame("ocean_buoys_seq")
>>> # Check DataFrame columns and let's peek at the data
... ocean_buoys_seq.columns
['TD_TIMECODE', 'TD_SEQNO', 'buoyid', 'salinity', 'temperature', 'dates']
>>> ocean_buoys_seq.head()
TD_TIMECODE TD_SEQNO salinity temperature dates
buoyid
0 2014-01-06 08:00:00.000000 26 55 10.0 2016-02-26
0 2014-01-06 08:08:59.999999 18 55 NaN 2015-06-18
1 2014-01-06 09:02:25.122200 24 55 78.0 2015-12-24
1 2014-01-06 09:01:25.122200 23 55 77.0 2015-11-23
1 2014-01-06 09:02:25.122200 12 55 71.0 2014-12-12
1 2014-01-06 09:03:25.122200 13 55 72.0 2015-01-13
1 2014-01-06 09:01:25.122200 11 55 70.0 2014-11-11
0 2014-01-06 08:10:00.000000 19 55 10.0 2015-07-19
0 2014-01-06 08:09:59.999999 17 55 99.0 2015-05-17
0 2014-01-06 08:10:00.000000 27 55 100.0 2016-03-27
>>> # DataFrame on NON-PTI table
... ocean_buoys_nonpti = DataFrame("ocean_buoys_nonpti")
>>> # Check DataFrame columns and let's peek at the data
... ocean_buoys_nonpti.columns
['buoyid', 'timecode', 'temperature', 'salinity']
>>> ocean_buoys_nonpti.head()
buoyid temperature salinity
timecode
2014-01-06 08:09:59.999999 0 99.0 55
2014-01-06 08:10:00.000000 0 10.0 55
2014-01-06 09:01:25.122200 1 70.0 55
2014-01-06 09:01:25.122200 1 77.0 55
2014-01-06 09:02:25.122200 1 71.0 55
2014-01-06 09:03:25.122200 1 72.0 55
2014-01-06 09:02:25.122200 1 78.0 55
2014-01-06 08:10:00.000000 0 100.0 55
2014-01-06 08:08:59.999999 0 NaN 55
2014-01-06 08:00:00.000000 0 10.0 55
### Examples for bottom without ties ###
#
# Example 1: Executing bottom function on DataFrame created on non-sequenced PTI table.
#
>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="MINUTES(2)",
... value_expression="buoyid", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_grpby1.bottom(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(MINUTES(2)) buoyid bottom2temperature
0 ('2014-01-06 08:00:00.000000+00:00', '2014-01-... 530161 0 10.0
1 ('2014-01-06 08:02:00.000000+00:00', '2014-01-... 530162 0 NaN
2 ('2014-01-06 08:04:00.000000+00:00', '2014-01-... 530163 0 NaN
3 ('2014-01-06 08:06:00.000000+00:00', '2014-01-... 530164 0 NaN
4 ('2014-01-06 08:08:00.000000+00:00', '2014-01-... 530165 0 99.0
5 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 530166 0 100.0
6 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 530166 0 10.0
7 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 530191 1 70.0
8 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 530191 1 77.0
9 ('2014-01-06 09:02:00.000000+00:00', '2014-01-... 530192 1 71.0
#
# Example 2: Executing bottom to select 2 values for all the columns in ocean_buoys_seq DataFrame
# on sequenced PTI table.
#
>>> ocean_buoys_seq_grpby1 = ocean_buoys_seq.groupby_time(timebucket_duration="MINUTES(2)",
... value_expression="buoyid", fill="NULLS")
>>> number_of_values_to_column = {2: "*"}
>>> ocean_buoys_seq_grpby1.bottom(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(MINUTES(2)) buoyid bottom2TD_SEQNO bottom2salinity bottom2temperature
0 ('2014-01-06 08:00:00.000000+00:00', '2014-01-... 530161 0 26.0 55.0 10.0
1 ('2014-01-06 08:02:00.000000+00:00', '2014-01-... 530162 0 NaN NaN NaN
2 ('2014-01-06 08:04:00.000000+00:00', '2014-01-... 530163 0 NaN NaN NaN
3 ('2014-01-06 08:06:00.000000+00:00', '2014-01-... 530164 0 NaN NaN NaN
4 ('2014-01-06 08:08:00.000000+00:00', '2014-01-... 530165 0 17.0 55.0 99.0
5 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 530166 0 19.0 55.0 10.0
6 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 530191 1 11.0 55.0 70.0
7 ('2014-01-06 09:02:00.000000+00:00', '2014-01-... 530192 1 12.0 55.0 71.0
8 ('2014-01-06 10:00:00.000000+00:00', '2014-01-... 530221 44 4.0 55.0 43.0
9 ('2014-01-06 10:02:00.000000+00:00', '2014-01-... 530222 44 9.0 55.0 53.0
#
# Example 3: Executing bottom function on DataFrame created on NON-PTI table.
#
>>> ocean_buoys_nonpti_grpby1 = ocean_buoys_nonpti.groupby_time(timebucket_duration="MINUTES(2)",
... value_expression="buoyid",
... timecode_column="timecode", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_nonpti_grpby1.bottom(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(MINUTES(2)) buoyid bottom2temperature
0 ('2014-01-06 08:00:00.000000+00:00', '2014-01-... 11574961 0 10.0
1 ('2014-01-06 08:02:00.000000+00:00', '2014-01-... 11574962 0 NaN
2 ('2014-01-06 08:04:00.000000+00:00', '2014-01-... 11574963 0 NaN
3 ('2014-01-06 08:06:00.000000+00:00', '2014-01-... 11574964 0 NaN
4 ('2014-01-06 08:08:00.000000+00:00', '2014-01-... 11574965 0 99.0
5 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 11574966 0 100.0
6 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 11574966 0 10.0
7 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 11574991 1 70.0
8 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 11574991 1 77.0
9 ('2014-01-06 09:02:00.000000+00:00', '2014-01-... 11574992 1 71.0
### Examples for bottom with ties ###
#
# Example 4: Executing bottom with ties function on DataFrame created on non-sequenced PTI table.
#
>>> ocean_buoys_grpby2 = ocean_buoys.groupby_time(timebucket_duration="MINUTES(2)",
... value_expression="buoyid", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_grpby2.bottom(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(MINUTES(2)) buoyid bottom_with_ties2temperature
0 ('2014-01-06 08:00:00.000000+00:00', '2014-01-... 530161 0 10.0
1 ('2014-01-06 08:02:00.000000+00:00', '2014-01-... 530162 0 NaN
2 ('2014-01-06 08:04:00.000000+00:00', '2014-01-... 530163 0 NaN
3 ('2014-01-06 08:06:00.000000+00:00', '2014-01-... 530164 0 NaN
4 ('2014-01-06 08:08:00.000000+00:00', '2014-01-... 530165 0 99.0
5 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 530166 0 100.0
6 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 530166 0 10.0
7 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 530191 1 77.0
8 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 530191 1 70.0
9 ('2014-01-06 09:02:00.000000+00:00', '2014-01-... 530192 1 71.0
#
# Example 5: Executing bottom with ties to select 2 values for temperature and 3 for rest of the columns in
# ocean_buoys DataFrame.
#
>>> ocean_buoys_grpby3 = ocean_buoys.groupby_time(timebucket_duration="MINUTES(2)", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature", 3:"*"}
>>> ocean_buoys_grpby3.bottom(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE"])
TIMECODE_RANGE GROUP BY TIME(MINUTES(2)) bottom_with_ties3buoyid bottom_with_ties3salinity bottom_with_ties2temperature
0 ('2014-01-06 08:00:00.000000+00:00', '2014-01-... 530161 0.0 55.0 10.0
1 ('2014-01-06 08:02:00.000000+00:00', '2014-01-... 530162 NaN NaN NaN
2 ('2014-01-06 08:04:00.000000+00:00', '2014-01-... 530163 NaN NaN NaN
3 ('2014-01-06 08:06:00.000000+00:00', '2014-01-... 530164 NaN NaN NaN
4 ('2014-01-06 08:08:00.000000+00:00', '2014-01-... 530165 0.0 55.0 99.0
5 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 530166 0.0 55.0 10.0
6 ('2014-01-06 08:12:00.000000+00:00', '2014-01-... 530167 NaN NaN NaN
7 ('2014-01-06 08:14:00.000000+00:00', '2014-01-... 530168 NaN NaN NaN
8 ('2014-01-06 08:16:00.000000+00:00', '2014-01-... 530169 NaN NaN NaN
9 ('2014-01-06 08:18:00.000000+00:00', '2014-01-... 530170 NaN NaN NaN
>>>
#
# Example 6: Executing bottom with ties function on DataFrame created on NON-PTI table.
#
>>> ocean_buoys_nonpti_grpby2 = ocean_buoys_nonpti.groupby_time(timebucket_duration="MINUTES(2)",
... value_expression="buoyid",
... timecode_column="timecode", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_nonpti_grpby2.bottom(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(MINUTES(2)) buoyid bottom_with_ties2temperature
0 ('2014-01-06 08:00:00.000000+00:00', '2014-01-... 11574961 0 10.0
1 ('2014-01-06 08:02:00.000000+00:00', '2014-01-... 11574962 0 NaN
2 ('2014-01-06 08:04:00.000000+00:00', '2014-01-... 11574963 0 NaN
3 ('2014-01-06 08:06:00.000000+00:00', '2014-01-... 11574964 0 NaN
4 ('2014-01-06 08:08:00.000000+00:00', '2014-01-... 11574965 0 99.0
5 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 11574966 0 10.0
6 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 11574966 0 100.0
7 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 11574991 1 77.0
8 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 11574991 1 70.0
9 ('2014-01-06 09:02:00.000000+00:00', '2014-01-... 11574992 1 71.0
>>>
>>>