groupby_time() | Teradata Python Package - groupby_time() - Teradata Package for Python

Teradata® Package for Python User Guide

Product
Teradata Package for Python
Release Number
17.00
Published
November 2021
Language
English (United States)
Last Update
2022-01-14
dita:mapPath
bol1585763678431.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage

The groupby_time() function resamples time series data to group the same by time on a datetime column of a teradataml DataFrame. It also allows grouping based on teradataml DataFrame columns.

Although the grouping is optimized for DataFrames created for PTI tables, it is also supported on non-PTI tables when the argument 'timecode_column' is specified.

  • This API is similar to resample().
  • You can still apply teradataml DataFrame methods (filters/sort/etc) on top of the result of this one.
  • Consecutive operations of grouping, i.e., groupby_time(), resample() and groupby() are not permitted. An exception will be raised. Following are some cases where exception will be raised as "Invalid operation applied, check documentation for correct usage."
    • df.groupby_time().groupby()
    • df.groupby_time().resample()
    • df.groupby_time().groupby_time()

Examples Prerequisite

  • Load the example datasets:
    >>> load_example_data("dataframe", ["ocean_buoys", "ocean_buoys_nonpti"])
  • Create required DataFrames:
    • DataFrame on non-sequenced PTI table
      >>> ocean_buoys = DataFrame("ocean_buoys")

      Check the DataFrame columns:

      >>>ocean_buoys.columns
      ['buoyid', 'TD_TIMECODE', 'temperature', 'salinity']

      Check the head of the DataFrame:

      >>> ocean_buoys.head()
                             TD_TIMECODE  temperature  salinity
      buoyid
      0       2014-01-06 08:10:00.000000        100.0        55
      0       2014-01-06 08:08:59.999999          NaN        55
      1       2014-01-06 09:01:25.122200         77.0        55
      1       2014-01-06 09:03:25.122200         79.0        55
      1       2014-01-06 09:01:25.122200         70.0        55
      1       2014-01-06 09:02:25.122200         71.0        55
      1       2014-01-06 09:03:25.122200         72.0        55
      0       2014-01-06 08:09:59.999999         99.0        55
      0       2014-01-06 08:00:00.000000         10.0        55
      0       2014-01-06 08:10:00.000000         10.0        55
    • DataFrame on non-PTI table
      >>> ocean_buoys_nonpti = DataFrame("ocean_buoys_nonpti")

      Check the DataFrame columns:

      >>> ocean_buoys_nonpti.columns
      ['buoyid', 'timecode', 'temperature', 'salinity']

      Check the head of the DataFrame:

      >>> ocean_buoys_nonpti.head()
                                  buoyid  temperature  salinity
      timecode
      2014-01-06 08:09:59.999999       0         99.0        55
      2014-01-06 08:10:00.000000       0         10.0        55
      2014-01-06 09:01:25.122200       1         70.0        55
      2014-01-06 09:01:25.122200       1         77.0        55
      2014-01-06 09:02:25.122200       1         71.0        55
      2014-01-06 09:03:25.122200       1         72.0        55
      2014-01-06 09:02:25.122200       1         78.0        55
      2014-01-06 08:10:00.000000       0        100.0        55
      2014-01-06 08:08:59.999999       0          NaN        55
      2014-01-06 08:00:00.000000       0         10.0        55

Example 1: Group by timebucket of two calendar years, on DataFrame created on non-sequenced PTI table

Use formal notation and 'buoyid' column on DataFrame created on non-sequenced PTI table.

Fill missing values with Nulls.

>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="CAL_YEARS(2)", value_expression="buoyid", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_grpby1.bottom(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"])
                                      TIMECODE_RANGE  GROUP BY TIME(CAL_YEARS(2))  buoyid  bottom2temperature
0  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       0                  10
1  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       0                  10
2  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       1                  71
3  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       1                  70
4  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       2                  80
5  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       2                  81
6  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2      44                  43
7  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2      44                  43

Example 2: Group by timebucket of two minutes, on DataFrame created on non-PTI table

Use shorthand notation to specify timebucket, on DataFrame created on non-PTI table.

Fill missing values with Nulls.

'timecode_column' must be specified for non-PTI table.
>>> ocean_buoys_nonpti_grpby2 = ocean_buoys_nonpti.groupby_time(timebucket_duration="2m", value_expression="buoyid", timecode_column="timecode", fill="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_nonpti_grpby2.bottom(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE", "buoyid"])
                                      TIMECODE_RANGE  GROUP BY TIME(MINUTES(2))  buoyid  bottom_with_ties2temperature
0  ('2014-01-06 08:00:00.000000+00:00', '2014-01-...                   11574961       0                          10.0
1  ('2014-01-06 08:02:00.000000+00:00', '2014-01-...                   11574962       0                           NaN
2  ('2014-01-06 08:04:00.000000+00:00', '2014-01-...                   11574963       0                           NaN
3  ('2014-01-06 08:06:00.000000+00:00', '2014-01-...                   11574964       0                           NaN
4  ('2014-01-06 08:08:00.000000+00:00', '2014-01-...                   11574965       0                          99.0
5  ('2014-01-06 08:10:00.000000+00:00', '2014-01-...                   11574966       0                         100.0
6  ('2014-01-06 08:10:00.000000+00:00', '2014-01-...                   11574966       0                          10.0
7  ('2014-01-06 09:00:00.000000+00:00', '2014-01-...                   11574991       1                          70.0
8  ('2014-01-06 09:00:00.000000+00:00', '2014-01-...                   11574991       1                          77.0
9  ('2014-01-06 09:02:00.000000+00:00', '2014-01-...                   11574992       1                          71.0

Example 3: Group by timebucket of two minutes, on DataFrame created on non-PTI table

Use shorthand notation to specify timebucket, on DataFrame created on non-PTI table.

Fill missing values with previous values.

'timecode_column' must be specified for non-PTI table.
>>> ocean_buoys_nonpti_grpby2 = ocean_buoys_nonpti.groupby_time(timebucket_duration="2mins", value_expression="buoyid", timecode_column="timecode", fill="prev")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_nonpti_grpby2.bottom(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE","buoyid"])
                                      TIMECODE_RANGE  GROUP BY TIME(MINUTES(2))  buoyid  bottom_with_ties2temperature
0  ('2014-01-06 08:00:00.000000+00:00', '2014-01-...                   11574961       0                            10
1  ('2014-01-06 08:02:00.000000+00:00', '2014-01-...                   11574962       0                            10
2  ('2014-01-06 08:04:00.000000+00:00', '2014-01-...                   11574963       0                            10
3  ('2014-01-06 08:06:00.000000+00:00', '2014-01-...                   11574964       0                            10
4  ('2014-01-06 08:08:00.000000+00:00', '2014-01-...                   11574965       0                            99
5  ('2014-01-06 08:10:00.000000+00:00', '2014-01-...                   11574966       0                            10
6  ('2014-01-06 08:10:00.000000+00:00', '2014-01-...                   11574966       0                           100
7  ('2014-01-06 09:00:00.000000+00:00', '2014-01-...                   11574991       1                            77
8  ('2014-01-06 09:00:00.000000+00:00', '2014-01-...                   11574991       1                            70
9  ('2014-01-06 09:02:00.000000+00:00', '2014-01-...                   11574992       1                            71

Example 4: Group by timebucket of two minutes, on DataFrame created on non-PTI table

Use shorthand notation to specify timebucket, on DataFrame created on non-PTI table.

Fill missing values with numeric constant 12345.

'timecode_column' must be specified for non-PTI table.
>>> ocean_buoys_nonpti_grpby2 = ocean_buoys_nonpti.groupby_time(timebucket_duration="2minute", value_expression="buoyid", timecode_column="timecode", fill=12345)
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_nonpti_grpby2.bottom(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE", "buoyid"])
 
                                      TIMECODE_RANGE  GROUP BY TIME(MINUTES(2))  buoyid  bottom_with_ties2temperature
0  ('2014-01-06 08:00:00.000000+00:00', '2014-01-...                   11574961       0                            10
1  ('2014-01-06 08:02:00.000000+00:00', '2014-01-...                   11574962       0                         12345
2  ('2014-01-06 08:04:00.000000+00:00', '2014-01-...                   11574963       0                         12345
3  ('2014-01-06 08:06:00.000000+00:00', '2014-01-...                   11574964       0                         12345
4  ('2014-01-06 08:08:00.000000+00:00', '2014-01-...                   11574965       0                            99
5  ('2014-01-06 08:10:00.000000+00:00', '2014-01-...                   11574966       0                            10
6  ('2014-01-06 08:10:00.000000+00:00', '2014-01-...                   11574966       0                           100
7  ('2014-01-06 09:00:00.000000+00:00', '2014-01-...                   11574991       1                            77
8  ('2014-01-06 09:00:00.000000+00:00', '2014-01-...                   11574991       1                            70
9  ('2014-01-06 09:02:00.000000+00:00', '2014-01-...                   11574992       1                            71