The first() function returns the oldest value, determined by the timecode, for each group. In the event of a tie, such as simultaneous timecode values for a particular group, all tied results are returned. If a sequence number is present with the data, it can break a tie, assuming it is unique across identical timecode values. It is a single-threaded function.
- This function is valid only on columns of numeric types.
- Nulls are not included in the result computation.
Examples Prerequisite
See Example Setup to set up the environment for the following examples.
Example: Run first() on DataFrame created on non-sequenced PTI table
>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="2cd", value_expression="buoyid", fill="NULLS")
>>> ocean_buoys_grpby1.first().sort(["TIMECODE_RANGE", "buoyid"]) /mnt/c/Users/pp186043/GitHub_Repos/pyTeradata/teradataml/common/utils.py:398: VantageRuntimeWarning: [Teradata][teradataml](TDML_2086) Following warning raised from Vantage with warning code: 4001 [Teradata Database] [Warning 4001] Time Series Auxiliary Cache Warning: Multiple results found for one or more Time Series aggregate functions in this query, but only one result was returned. To get all results, resubmit this query with these aggregates isolated. VantageRuntimeWarning) TIMECODE_RANGE GROUP BY TIME(CAL_DAYS(2)) buoyid first_salinity first_temperature 0 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 0 55 10 1 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 1 55 70 2 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 2 55 80 3 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 44 55 43
"[Teradata Database] [Warning 4001] Time Series Auxiliary Cache Warning: Multiple results found for one or more Time Series aggregate functions in this query, but only one result was returned. To get all results, resubmit this query with these aggregates isolated."
This warning recommends to run first() independently on each column to get all the results.
Example: Run first() on one single column
When running the first() function on one single column, you can pass column name as input. In this example, run first() on 'temperature' column.
>>> ocean_buoys_grpby1.first('temperature') TIMECODE_RANGE GROUP BY TIME(CAL_DAYS(2)) buoyid first_temperature 0 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 2 80 1 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 0 10 2 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 44 43 3 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 1 77 4 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 1 70
Example: Run first() on DataFrame created on sequenced PTI table
In this example, the table has few columns incompatible for first() operation 'dates' and 'TD_TIMECODE'. When running this command, incompatible columns are ignored.
>>> ocean_buoys_seq_grpby1 = ocean_buoys_seq.groupby_time(timebucket_duration="MINUTES(2)", value_expression="buoyid", fill="NULLS")
>>> ocean_buoys_seq_grpby1.first().sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(MINUTES(2)) buoyid first_TD_SEQNO first_salinity first_temperature 0 ('2014-01-06 08:00:00.000000+00:00', '2014-01-... 530161 0 26.0 55.0 10.0 1 ('2014-01-06 08:02:00.000000+00:00', '2014-01-... 530162 0 NaN NaN NaN 2 ('2014-01-06 08:04:00.000000+00:00', '2014-01-... 530163 0 NaN NaN NaN 3 ('2014-01-06 08:06:00.000000+00:00', '2014-01-... 530164 0 NaN NaN NaN 4 ('2014-01-06 08:08:00.000000+00:00', '2014-01-... 530165 0 18.0 55.0 99.0 5 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 530166 0 19.0 55.0 10.0 6 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 530191 1 11.0 55.0 70.0 7 ('2014-01-06 09:02:00.000000+00:00', '2014-01-... 530192 1 12.0 55.0 71.0 8 ('2014-01-06 10:00:00.000000+00:00', '2014-01-... 530221 44 4.0 55.0 43.0 9 ('2014-01-06 10:02:00.000000+00:00', '2014-01-... 530222 44 9.0 55.0 53.0
Example: Run first() on DataFrame created on non-PTI table
>>> ocean_buoys_nonpti_grpby1 = ocean_buoys_nonpti.groupby_time(timebucket_duration="2cd", value_expression="buoyid", timecode_column="timecode", fill="NULLS")
>>> ocean_buoys_nonpti_grpby1.first().sort(["TIMECODE_RANGE", "buoyid"]) /mnt/c/Users/pp186043/GitHub_Repos/pyTeradata/teradataml/common/utils.py:398: VantageRuntimeWarning: [Teradata][teradataml](TDML_2086) Following warning raised from Vantage with warning code: 4001 [Teradata Database] [Warning 4001] Time Series Auxiliary Cache Warning: Multiple results found for one or more Time Series aggregate functions in this query, but only one result was returned. To get all results, resubmit this query with these aggregates isolated. VantageRuntimeWarning) TIMECODE_RANGE GROUP BY TIME(CAL_DAYS(2)) buoyid first_salinity first_temperature 0 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 8039 0 55 10 1 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 8039 1 55 70 2 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 8039 2 55 80 3 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 8039 44 55 43
Example: Run first() on selected columns
Run the first() function on selected columns 'temperature' and 'salinity'.
>>> ocean_buoys_seq_grpby1.first(["temperature", "salinity"]).sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(CAL_DAYS(2)) buoyid first_temperature first_salinity 0 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 0 10 55 1 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 1 70 55 2 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 2 80 55 3 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 22 23 25 4 ('2014-01-06 00:00:00.000000-00:00', '2014-01-... 369 44 43 55