resample() | Teradata Python Package - resample() - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
Language
English (United States)
Last Update
2024-12-18
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

The resample() function resamples time series data to group the same by time on a datetime column of a teradataml DataFrame. It also allows grouping done based on teradataml DataFrame columns.

This function applies Group By Time to one or more columns of a teradataml DataFrame. Outcome of this function can be used to run Time Series Aggregate functions.

Although the grouping is optimized for DataFrames created for PTI tables, it is also supported on non-PTI tables when the argument 'timecode_column' is specified.

  • This API is similar to groupby_time().
  • You can still apply teradataml DataFrame methods (filters/sort/etc) on top of the result of this one.
  • Consecutive operations of grouping, i.e., groupby_time(), resample() and groupby() are not permitted. An exception will be raised. Following are some cases where exception will be raised as "Invalid operation applied, check documentation for correct usage."
    • df.resample().groupby()
    • df.resample().resample()
    • df.resample().groupby_time()

Examples Prerequisite

See Example Setup to set up the environment for the following examples.

Example 1: Group by timebucket of two calendar years, on DataFrame created on non-sequenced PTI table

Use formal notation and 'buoyid' column on DataFrame created on non-sequenced PTI table.

Fill missing values with Nulls.

>>> ocean_buoys_grpby1 = ocean_buoys.resample(rule="CAL_YEARS(2)", value_expression="buoyid", fill_method="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_grpby1.bottom(number_of_values_to_column).sort(["TIMECODE_RANGE", "buoyid"])
                                      TIMECODE_RANGE  GROUP BY TIME(CAL_YEARS(2))  buoyid  bottom2temperature
0  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       0                  10
1  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       0                  10
2  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       1                  71
3  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       1                  70
4  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       2                  80
5  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       2                  81
6  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2      44                  43
7  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2      44                  43

Example 2: Group by timebucket of two minutes, on DataFrame created on non-PTI table

Use shorthand notation to specify timebucket, on DataFrame created on non-PTI table.

Fill missing values with Nulls.

Time column must be specified for non-PTI table to 'on' argument.
>>> ocean_buoys_nonpti_grpby2 = ocean_buoys_nonpti.resample(rule="2m", value_expression="buoyid", on="timecode", fill_method="NULLS")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_nonpti_grpby2.bottom(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE", "buoyid"])
                                      TIMECODE_RANGE  GROUP BY TIME(MINUTES(2))  buoyid  bottom_with_ties2temperature
0  ('2014-01-06 08:00:00.000000+00:00', '2014-01-...                   11574961       0                          10.0
1  ('2014-01-06 08:02:00.000000+00:00', '2014-01-...                   11574962       0                           NaN
2  ('2014-01-06 08:04:00.000000+00:00', '2014-01-...                   11574963       0                           NaN
3  ('2014-01-06 08:06:00.000000+00:00', '2014-01-...                   11574964       0                           NaN
4  ('2014-01-06 08:08:00.000000+00:00', '2014-01-...                   11574965       0                          99.0
5  ('2014-01-06 08:10:00.000000+00:00', '2014-01-...                   11574966       0                         100.0
6  ('2014-01-06 08:10:00.000000+00:00', '2014-01-...                   11574966       0                          10.0
7  ('2014-01-06 09:00:00.000000+00:00', '2014-01-...                   11574991       1                          70.0
8  ('2014-01-06 09:00:00.000000+00:00', '2014-01-...                   11574991       1                          77.0
9  ('2014-01-06 09:02:00.000000+00:00', '2014-01-...                   11574992       1                          71.0

Example 3: Group by timebucket of two minutes, on DataFrame created on non-PTI table

Use shorthand notation to specify timebucket, on DataFrame created on non-PTI table.

Fill missing values with previous values.

Time column must be specified for non-PTI table to 'on' argument.
>>> ocean_buoys_nonpti_grpby2 = ocean_buoys_nonpti.resample(rule="2m", value_expression="buoyid", on="timecode", fill_method="prev")
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_nonpti_grpby2.bottom(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE","buoyid"])
                                      TIMECODE_RANGE  GROUP BY TIME(MINUTES(2))  buoyid  bottom_with_ties2temperature
0  ('2014-01-06 08:00:00.000000+00:00', '2014-01-...                   11574961       0                            10
1  ('2014-01-06 08:02:00.000000+00:00', '2014-01-...                   11574962       0                            10
2  ('2014-01-06 08:04:00.000000+00:00', '2014-01-...                   11574963       0                            10
3  ('2014-01-06 08:06:00.000000+00:00', '2014-01-...                   11574964       0                            10
4  ('2014-01-06 08:08:00.000000+00:00', '2014-01-...                   11574965       0                            99
5  ('2014-01-06 08:10:00.000000+00:00', '2014-01-...                   11574966       0                            10
6  ('2014-01-06 08:10:00.000000+00:00', '2014-01-...                   11574966       0                           100
7  ('2014-01-06 09:00:00.000000+00:00', '2014-01-...                   11574991       1                            77
8  ('2014-01-06 09:00:00.000000+00:00', '2014-01-...                   11574991       1                            70
9  ('2014-01-06 09:02:00.000000+00:00', '2014-01-...                   11574992       1                            71

Example 4: Group by timebucket of two minutes, on DataFrame created on non-PTI table

Use shorthand notation to specify timebucket, on DataFrame created on non-PTI table.

Fill missing values with numeric constant 12345.

Time column must be specified for non-PTI table to 'on' argument.
>>> ocean_buoys_nonpti_grpby2 = ocean_buoys_nonpti.resample(rule="2minute", value_expression="buoyid", on="timecode", fill_method=12345)
>>> number_of_values_to_column = {2: "temperature"}
>>> ocean_buoys_nonpti_grpby2.bottom(number_of_values_to_column, with_ties=True).sort(["TIMECODE_RANGE", "buoyid"])
 
                                      TIMECODE_RANGE  GROUP BY TIME(MINUTES(2))  buoyid  bottom_with_ties2temperature
0  ('2014-01-06 08:00:00.000000+00:00', '2014-01-...                   11574961       0                            10
1  ('2014-01-06 08:02:00.000000+00:00', '2014-01-...                   11574962       0                         12345
2  ('2014-01-06 08:04:00.000000+00:00', '2014-01-...                   11574963       0                         12345
3  ('2014-01-06 08:06:00.000000+00:00', '2014-01-...                   11574964       0                         12345
4  ('2014-01-06 08:08:00.000000+00:00', '2014-01-...                   11574965       0                            99
5  ('2014-01-06 08:10:00.000000+00:00', '2014-01-...                   11574966       0                            10
6  ('2014-01-06 08:10:00.000000+00:00', '2014-01-...                   11574966       0                           100
7  ('2014-01-06 09:00:00.000000+00:00', '2014-01-...                   11574991       1                            77
8  ('2014-01-06 09:00:00.000000+00:00', '2014-01-...                   11574991       1                            70
9  ('2014-01-06 09:02:00.000000+00:00', '2014-01-...                   11574992       1                            71