The groupby_time() function resamples time series data to group the same by time on a datetime column of a teradataml DataFrame. It also allows grouping based on teradataml DataFrame columns.
Although the grouping is optimized for DataFrames created for PTI tables, it is also supported on non-PTI tables when the argument 'timecode_column' is specified.
- This API is similar to resample().
- You can still apply teradataml DataFrame methods (filters/sort/etc) on top of the result of this one.
- Consecutive operations of grouping, i.e., groupby_time(), resample() and groupby() are not permitted. An exception will be raised. Following are some cases where exception will be raised as "Invalid operation applied, check documentation for correct usage."
- df.groupby_time().groupby()
- df.groupby_time().resample()
- df.groupby_time().groupby_time()
Examples Prerequisite
- Load the example datasets:
>>> load_example_data("dataframe", ["ocean_buoys", "ocean_buoys_nonpti"])
- Create required DataFrames:
- DataFrame on non-sequenced PTI table
>>> ocean_buoys = DataFrame("ocean_buoys")
Check the DataFrame columns:
>>>ocean_buoys.columns ['buoyid', 'TD_TIMECODE', 'temperature', 'salinity']
Check the head of the DataFrame:
>>> ocean_buoys.head() TD_TIMECODE temperature salinity buoyid 0 2014-01-06 08:10:00.000000 100.0 55 0 2014-01-06 08:08:59.999999 NaN 55 1 2014-01-06 09:01:25.122200 77.0 55 1 2014-01-06 09:03:25.122200 79.0 55 1 2014-01-06 09:01:25.122200 70.0 55 1 2014-01-06 09:02:25.122200 71.0 55 1 2014-01-06 09:03:25.122200 72.0 55 0 2014-01-06 08:09:59.999999 99.0 55 0 2014-01-06 08:00:00.000000 10.0 55 0 2014-01-06 08:10:00.000000 10.0 55
- DataFrame on non-PTI table
>>> ocean_buoys_nonpti = DataFrame("ocean_buoys_nonpti")
Check the DataFrame columns:
>>> ocean_buoys_nonpti.columns ['buoyid', 'timecode', 'temperature', 'salinity']
Check the head of the DataFrame:
>>> ocean_buoys_nonpti.head() buoyid temperature salinity timecode 2014-01-06 08:09:59.999999 0 99.0 55 2014-01-06 08:10:00.000000 0 10.0 55 2014-01-06 09:01:25.122200 1 70.0 55 2014-01-06 09:01:25.122200 1 77.0 55 2014-01-06 09:02:25.122200 1 71.0 55 2014-01-06 09:03:25.122200 1 72.0 55 2014-01-06 09:02:25.122200 1 78.0 55 2014-01-06 08:10:00.000000 0 100.0 55 2014-01-06 08:08:59.999999 0 NaN 55 2014-01-06 08:00:00.000000 0 10.0 55
- DataFrame on non-sequenced PTI table
Example 1: Group by timebucket of two calendar years, on DataFrame created on non-sequenced PTI table
Use formal notation and 'buoyid' column on DataFrame created on non-sequenced PTI table.
Fill missing values with Nulls.
>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="CAL_YEARS(2)", value_expression="buoyid", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_grpby1.bottom(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid bottom2temperature 0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 10 1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 10 2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 71 3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 70 4 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 80 5 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 81 6 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 43 7 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 43
Example 2: Group by timebucket of two minutes, on DataFrame created on non-PTI table
Use shorthand notation to specify timebucket, on DataFrame created on non-PTI table.
Fill missing values with Nulls.
>>> ocean_buoys_nonpti_grpby2 = ocean_buoys_nonpti.groupby_time(timebucket_duration="2m", value_expression="buoyid", timecode_column="timecode", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_nonpti_grpby2.bottom(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(MINUTES(2)) buoyid bottom_with_ties2temperature 0 ('2014-01-06 08:00:00.000000+00:00', '2014-01-... 11574961 0 10.0 1 ('2014-01-06 08:02:00.000000+00:00', '2014-01-... 11574962 0 NaN 2 ('2014-01-06 08:04:00.000000+00:00', '2014-01-... 11574963 0 NaN 3 ('2014-01-06 08:06:00.000000+00:00', '2014-01-... 11574964 0 NaN 4 ('2014-01-06 08:08:00.000000+00:00', '2014-01-... 11574965 0 99.0 5 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 11574966 0 100.0 6 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 11574966 0 10.0 7 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 11574991 1 70.0 8 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 11574991 1 77.0 9 ('2014-01-06 09:02:00.000000+00:00', '2014-01-... 11574992 1 71.0
Example 3: Group by timebucket of two minutes, on DataFrame created on non-PTI table
Use shorthand notation to specify timebucket, on DataFrame created on non-PTI table.
Fill missing values with previous values.
>>> ocean_buoys_nonpti_grpby2 = ocean_buoys_nonpti.groupby_time(timebucket_duration="2mins", value_expression="buoyid", timecode_column="timecode", fill="prev")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_nonpti_grpby2.bottom(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE","buoyid"]) TIMECODE_RANGE GROUP BY TIME(MINUTES(2)) buoyid bottom_with_ties2temperature 0 ('2014-01-06 08:00:00.000000+00:00', '2014-01-... 11574961 0 10 1 ('2014-01-06 08:02:00.000000+00:00', '2014-01-... 11574962 0 10 2 ('2014-01-06 08:04:00.000000+00:00', '2014-01-... 11574963 0 10 3 ('2014-01-06 08:06:00.000000+00:00', '2014-01-... 11574964 0 10 4 ('2014-01-06 08:08:00.000000+00:00', '2014-01-... 11574965 0 99 5 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 11574966 0 10 6 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 11574966 0 100 7 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 11574991 1 77 8 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 11574991 1 70 9 ('2014-01-06 09:02:00.000000+00:00', '2014-01-... 11574992 1 71
Example 4: Group by timebucket of two minutes, on DataFrame created on non-PTI table
Use shorthand notation to specify timebucket, on DataFrame created on non-PTI table.
Fill missing values with numeric constant 12345.
>>> ocean_buoys_nonpti_grpby2 = ocean_buoys_nonpti.groupby_time(timebucket_duration="2minute", value_expression="buoyid", timecode_column="timecode", fill=12345)
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_nonpti_grpby2.bottom(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(MINUTES(2)) buoyid bottom_with_ties2temperature 0 ('2014-01-06 08:00:00.000000+00:00', '2014-01-... 11574961 0 10 1 ('2014-01-06 08:02:00.000000+00:00', '2014-01-... 11574962 0 12345 2 ('2014-01-06 08:04:00.000000+00:00', '2014-01-... 11574963 0 12345 3 ('2014-01-06 08:06:00.000000+00:00', '2014-01-... 11574964 0 12345 4 ('2014-01-06 08:08:00.000000+00:00', '2014-01-... 11574965 0 99 5 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 11574966 0 10 6 ('2014-01-06 08:10:00.000000+00:00', '2014-01-... 11574966 0 100 7 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 11574991 1 77 8 ('2014-01-06 09:00:00.000000+00:00', '2014-01-... 11574991 1 70 9 ('2014-01-06 09:02:00.000000+00:00', '2014-01-... 11574992 1 71