Teradata Package for Python Function Reference | 20.00 - first - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- Product Category
- Teradata Vantage
- teradataml.dataframe.dataframe.DataFrameGroupByTime.first = first(self, columns=None)
- DESCRIPTION:
Returns the oldest value, determined by the timecode, for each group. FIRST is a single-threaded function.
In the event of a tie, such as simultaneous timecode values for a particular group, all tied results
are returned. If a sequence number is present with the data, it can break a tie, assuming it is unique
across identical timecode values.
Note:
1. This function is valid only on columns with numeric types.
2. Null values are not included in the result computation.
PARAMETERS:
columns:
Optional Argument.
Specifies a column name or list of column names on which first() operation
must be run. By default oldest value is returned for all the compatible columns
in a teradataml DataFrame.
Types: str OR list of Strings (str)
RETURNS:
teradataml DataFrame object with first() operation performed.
RAISES:
1. TDMLDF_AGGREGATE_FAILED - If first() operation fails to
return oldest value of columns in the teradataml DataFrame.
Possible error message:
Unable to perform 'first()' on the teradataml DataFrame.
2. TDMLDF_AGGREGATE_COMBINED_ERR - If the first() operation
doesn't support all the columns in the teradataml DataFrame.
Possible error message:
No results. Below is/are the error message(s):
All selected columns [(col2 - PERIOD_TIME), (col3 -
BLOB)] is/are unsupported for 'first' operation.
EXAMPLES :
>>> # Load the example datasets.
... load_example_data("dataframe", ["ocean_buoys", "ocean_buoys_seq", "ocean_buoys_nonpti"])
>>>
#
# Example 1: Executing first function on DataFrame created on non-sequenced PTI table.
#
>>> # Create the required DataFrames.
... # DataFrame on non-sequenced PTI table
... ocean_buoys = DataFrame("ocean_buoys")
>>> # Check DataFrame columns and let's peek at the data
... ocean_buoys.columns
['buoyid', 'TD_TIMECODE', 'temperature', 'salinity']
>>> ocean_buoys.head()
TD_TIMECODE temperature salinity
buoyid
0 2014-01-06 08:10:00.000000 100.0 55
0 2014-01-06 08:08:59.999999 NaN 55
1 2014-01-06 09:01:25.122200 77.0 55
1 2014-01-06 09:03:25.122200 79.0 55
1 2014-01-06 09:01:25.122200 70.0 55
1 2014-01-06 09:02:25.122200 71.0 55
1 2014-01-06 09:03:25.122200 72.0 55
0 2014-01-06 08:09:59.999999 99.0 55
0 2014-01-06 08:00:00.000000 10.0 55
0 2014-01-06 08:10:00.000000 10.0 55
>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="2cd",
... value_expression="buoyid", fill="NULLS")
>>> ocean_buoys_grpby1.first().sort(["TIMECODE_RANGE", "buoyid"])
/mnt/c/Users/pp186043/GitHub_Repos/pyTeradata/teradataml/common/utils.py:398: VantageRuntimeWarning: [Teradata][teradataml](TDML_2086) Following warning raised from Vantage with warning code: 4001
[Teradata Database] [Warning 4001] Time Series Auxiliary Cache Warning: Multiple results found for one or more Time Series aggregate functions in this query, but only one result was returned. To get all results, resubmit this query with these aggregates isolated.
VantageRuntimeWarning)
TIMECODE_RANGE GROUP BY TIME(CAL_DAYS(2)) buoyid first_salinity first_temperature
0 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 0 55 10
1 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 1 55 70
2 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 2 55 80
3 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 44 55 43
>>>
#
# Example 2: In Example 1, a VantageRuntimeWarning is raised as:
# "[Teradata Database] [Warning 4001] Time Series Auxiliary Cache Warning: Multiple results
# found for one or more Time Series aggregate functions in this query, but only one result
# was returned. To get all results, resubmit this query with these aggregates isolated."
#
# This warning recommends to execute first() independently on each column, so that we will get
# all the results.
# To run first() on one single column we can pass column name as input. Let's run first()
# on 'temperature' column.
#
>>> ocean_buoys_grpby1.first('temperature')
TIMECODE_RANGE GROUP BY TIME(CAL_DAYS(2)) buoyid first_temperature
0 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 2 80
1 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 0 10
2 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 44 43
3 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 1 77
4 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 1 70
>>>
#
# Example 3: Executing first function on ocean_buoys_seq DataFrame created on sequenced PTI table.
# Table has few columns incompatible for first() operation 'dates' and 'TD_TIMECODE',
# while executing this first() incompatible columns are ignored.
#
>>> # DataFrame on sequenced PTI table
... ocean_buoys_seq = DataFrame("ocean_buoys_seq")
>>> # Check DataFrame columns and let's peek at the data
... ocean_buoys_seq.columns
['TD_TIMECODE', 'TD_SEQNO', 'buoyid', 'salinity', 'temperature', 'dates']
>>> ocean_buoys_seq.head()
TD_TIMECODE TD_SEQNO salinity temperature dates
buoyid
0 2014-01-06 08:00:00.000000 26 55 10.0 2016-02-26
0 2014-01-06 08:08:59.999999 18 55 NaN 2015-06-18
1 2014-01-06 09:02:25.122200 24 55 78.0 2015-12-24
1 2014-01-06 09:01:25.122200 23 55 77.0 2015-11-23
1 2014-01-06 09:02:25.122200 12 55 71.0 2014-12-12
1 2014-01-06 09:03:25.122200 13 55 72.0 2015-01-13
1 2014-01-06 09:01:25.122200 11 55 70.0 2014-11-11
0 2014-01-06 08:10:00.000000 19 55 10.0 2015-07-19
0 2014-01-06 08:09:59.999999 17 55 99.0 2015-05-17
0 2014-01-06 08:10:00.000000 27 55 100.0 2016-03-27
>>> ocean_buoys_seq_grpby1 = ocean_buoys_seq.groupby_time(timebucket_duration="2cd",
... value_expression="buoyid", fill="NULLS")
>>> ocean_buoys_seq_grpby1.first().sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(CAL_DAYS(2)) buoyid first_TD_SEQNO first_salinity first_temperature
0 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 0 26 55 10
1 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 1 11 55 70
2 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 2 14 55 80
3 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 22 1 25 23
4 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 44 4 55 43
>>>
#
# Example 4: Executing first function on DataFrame created on NON-PTI table.
#
>>> # DataFrame on NON-PTI table
... ocean_buoys_nonpti = DataFrame("ocean_buoys_nonpti")
>>> # Check DataFrame columns and let's peek at the data
... ocean_buoys_nonpti.columns
['buoyid', 'timecode', 'temperature', 'salinity']
>>> ocean_buoys_nonpti.head()
buoyid temperature salinity
timecode
2014-01-06 08:09:59.999999 0 99.0 55
2014-01-06 08:10:00.000000 0 10.0 55
2014-01-06 09:01:25.122200 1 70.0 55
2014-01-06 09:01:25.122200 1 77.0 55
2014-01-06 09:02:25.122200 1 71.0 55
2014-01-06 09:03:25.122200 1 72.0 55
2014-01-06 09:02:25.122200 1 78.0 55
2014-01-06 08:10:00.000000 0 100.0 55
2014-01-06 08:08:59.999999 0 NaN 55
2014-01-06 08:00:00.000000 0 10.0 55
>>> ocean_buoys_nonpti_grpby1 = ocean_buoys_nonpti.groupby_time(timebucket_duration="2cd",
... value_expression="buoyid",
... timecode_column="timecode", fill="NULLS")
>>> ocean_buoys_nonpti_grpby1.first().sort(["TIMECODE_RANGE", "buoyid"])
/mnt/c/Users/pp186043/GitHub_Repos/pyTeradata/teradataml/common/utils.py:398: VantageRuntimeWarning: [Teradata][teradataml](TDML_2086) Following warning raised from Vantage with warning code: 4001
[Teradata Database] [Warning 4001] Time Series Auxiliary Cache Warning: Multiple results found for one or more Time Series aggregate functions in this query, but only one result was returned. To get all results, resubmit this query with these aggregates isolated.
VantageRuntimeWarning)
TIMECODE_RANGE GROUP BY TIME(CAL_DAYS(2)) buoyid first_salinity first_temperature
0 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 8039 0 55 10
1 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 8039 1 55 70
2 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 8039 2 55 80
3 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 8039 44 55 43
>>>
#
# Example 5: Execute first() on a few select columns 'temperature' and 'salinity'.
#
>>> ocean_buoys_seq_grpby1.first(["temperature", "salinity"]).sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(CAL_DAYS(2)) buoyid first_temperature first_salinity
0 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 0 10 55
1 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 1 70 55
2 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 2 80 55
3 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 22 23 25
4 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 44 43 55
>>>