first() - Teradata Python Package

The first() function returns the oldest value, determined by the timecode, for each group. In the event of a tie, such as simultaneous timecode values for a particular group, all tied results are returned. If a sequence number is present with the data, it can break a tie, assuming it is unique across identical timecode values. It is a single-threaded function.

This function is valid only on columns of numeric types.
Nulls are not included in the result computation.

Examples Prerequisite

See Example Setup to set up the environment for the following examples.

Example: Run first() on DataFrame created on non-sequenced PTI table

>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="2cd", value_expression="buoyid", fill="NULLS")

>>> ocean_buoys_grpby1.first().sort(["TIMECODE_RANGE", "buoyid"])
/mnt/c/Users/pp186043/GitHub_Repos/pyTeradata/teradataml/common/utils.py:398: VantageRuntimeWarning: [Teradata][teradataml](TDML_2086) Following warning raised from Vantage with warning code: 4001
[Teradata Database] [Warning 4001] Time Series Auxiliary Cache Warning: Multiple results found for one or more Time Series aggregate functions in this query, but only one result was returned. To get all results, resubmit this query with these aggregates isolated.
  VantageRuntimeWarning)
                                      TIMECODE_RANGE  GROUP BY TIME(CAL_DAYS(2))  buoyid  first_salinity  first_temperature
0  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                         369       0              55                 10
1  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                         369       1              55                 70
2  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                         369       2              55                 80
3  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                         369      44              55                 43

In this example, a VantageRuntimeWarning is raised as:

"[Teradata Database] [Warning 4001] Time Series Auxiliary Cache Warning: Multiple results found for one or more Time Series aggregate functions in this query, but only one result was returned. To get all results, resubmit this query with these aggregates isolated."

This warning recommends to run first() independently on each column to get all the results.

Example: Run first() on one single column

When running the first() function on one single column, you can pass column name as input. In this example, run first() on 'temperature' column.

>>> ocean_buoys_grpby1.first('temperature')
                                      TIMECODE_RANGE  GROUP BY TIME(CAL_DAYS(2))  buoyid  first_temperature
0  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                         369       2                 80
1  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                         369       0                 10
2  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                         369      44                 43
3  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                         369       1                 77
4  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                         369       1                 70

Example: Run first() on DataFrame created on sequenced PTI table

In this example, the table has few columns incompatible for first() operation 'dates' and 'TD_TIMECODE'. When running this command, incompatible columns are ignored.

>>> ocean_buoys_seq_grpby1 = ocean_buoys_seq.groupby_time(timebucket_duration="MINUTES(2)", value_expression="buoyid", fill="NULLS")

>>> ocean_buoys_seq_grpby1.first().sort(["TIMECODE_RANGE", "buoyid"])
                                      TIMECODE_RANGE  GROUP BY TIME(MINUTES(2))  buoyid  first_TD_SEQNO  first_salinity  first_temperature
0  ('2014-01-06 08:00:00.000000+00:00', '2014-01-...                     530161       0            26.0            55.0               10.0
1  ('2014-01-06 08:02:00.000000+00:00', '2014-01-...                     530162       0             NaN             NaN                NaN
2  ('2014-01-06 08:04:00.000000+00:00', '2014-01-...                     530163       0             NaN             NaN                NaN
3  ('2014-01-06 08:06:00.000000+00:00', '2014-01-...                     530164       0             NaN             NaN                NaN
4  ('2014-01-06 08:08:00.000000+00:00', '2014-01-...                     530165       0            18.0            55.0               99.0
5  ('2014-01-06 08:10:00.000000+00:00', '2014-01-...                     530166       0            19.0            55.0               10.0
6  ('2014-01-06 09:00:00.000000+00:00', '2014-01-...                     530191       1            11.0            55.0               70.0
7  ('2014-01-06 09:02:00.000000+00:00', '2014-01-...                     530192       1            12.0            55.0               71.0
8  ('2014-01-06 10:00:00.000000+00:00', '2014-01-...                     530221      44             4.0            55.0               43.0
9  ('2014-01-06 10:02:00.000000+00:00', '2014-01-...                     530222      44             9.0            55.0               53.0

Example: Run first() on DataFrame created on non-PTI table

>>> ocean_buoys_nonpti_grpby1 = ocean_buoys_nonpti.groupby_time(timebucket_duration="2cd", value_expression="buoyid", timecode_column="timecode", fill="NULLS")

>>> ocean_buoys_nonpti_grpby1.first().sort(["TIMECODE_RANGE", "buoyid"])
/mnt/c/Users/pp186043/GitHub_Repos/pyTeradata/teradataml/common/utils.py:398: VantageRuntimeWarning: [Teradata][teradataml](TDML_2086) Following warning raised from Vantage with warning code: 4001
[Teradata Database] [Warning 4001] Time Series Auxiliary Cache Warning: Multiple results found for one or more Time Series aggregate functions in this query, but only one result was returned. To get all results, resubmit this query with these aggregates isolated.
  VantageRuntimeWarning)
                                      TIMECODE_RANGE  GROUP BY TIME(CAL_DAYS(2))  buoyid  first_salinity  first_temperature
0  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                        8039       0              55                 10
1  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                        8039       1              55                 70
2  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                        8039       2              55                 80
3  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                        8039      44              55                 43

Example: Run first() on selected columns

Run the first() function on selected columns 'temperature' and 'salinity'.

>>> ocean_buoys_seq_grpby1.first(["temperature", "salinity"]).sort(["TIMECODE_RANGE", "buoyid"])
                                      TIMECODE_RANGE  GROUP BY TIME(CAL_DAYS(2))  buoyid  first_temperature  first_salinity
0  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                         369       0                 10              55
1  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                         369       1                 70              55
2  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                         369       2                 80              55
3  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                         369      22                 23              25
4  ('2014-01-06 00:00:00.000000-00:00', '2014-01-...                         369      44                 43              55