Window Aggregate Functions | Teradata Package for Python - Window Aggregate Functions - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

teradataml supports following window aggregate functions that can be executed on top of teradataml Window object created using DataFrame.window() and DataFrameColumn.window().

See the teradataml: Window Aggregates section of Teradata Package for Python Function Reference on VantageCloud Lake, B700-4500) at https://docs.teradata.com/ for detailed description and usage examples of these functions.

Sr. No. Function Name Description
1 corr() Returns the Sample Pearson product moment correlation coefficient of its arguments for all non-null data point pairs in a teradataml DataFrame or ColumnExpression over the specified window.
2 count() Returns the total number of qualified rows in a teradataml DataFrame or ColumnExpression over the specified window.
3 covar_pop()

Returns the population covariance of its arguments for all non-null data point pairs over the specified window.

Covariance measures whether or not two random variables vary in the same way. It is the average of the products of deviations for each non-null data point pair.

4 covar_samp()

Returns the sample covariance of its arguments for all non-null data point pairs over the specified window.

Covariance measures whether or not two random variables vary in the same way. It is the average of the products of deviations for each non-null data point pair.

5 cume_dist() Returns the cumulative distribution of values in a teradataml DataFrame or ColumnExpression over the specified window.
6 dense_rank() Returns the ordered ranking of all the rows in a teradataml DataFrame or ColumnExpression, according to "order_columns", over the specified window.
7 first_value() Returns the first value of an ordered set of values in a teradataml DataFrame or ColumnExpression over the specified window.
8 lag() The lag function accesses data from the row preceding the current row at a specified offset value over the specified window in a teradataml DataFrame or ColumnExpression.
9 last_value() Returns the last value of an ordered set of values in a teradataml DataFrame or ColumnExpression over the specified window.
10 lead() The lead function accesses data from the row following the current row at a specified offset value over the specified window in a teradataml DataFrame or ColumnExpression.
11 max() Returns the maximum of values in teradataml DataFrame or ColumnExpression over the specified window.
12 mean() Returns the arithmetic average of all values in teradataml DataFrame or ColumnExpression over the specified window.
13 min() Returns the minimum of values in teradataml DataFrame or ColumnExpression over the specified window.
14 percent_rank() Returns the relative rank of all the rows in a teradataml DataFrame or ColumnExpression, according to "order_columns", over the specified window.
15 rank() Returns the rank (1 … n) of all the rows in a teradataml DataFrame or ColumnExpression, according to "order_columns", over the specified window.
16 regr_avgx() Returns the mean of the independent variable for all non-null data pairs of the dependent and an independent variable arguments over the specified window.
17 regr_avgy() Returns the mean of the dependent variable for all non-null data pairs of the dependent and independent variable arguments over the specified window.
18 regr_count() Returns the column-wise count of all non-null data pairs of the dependent and independent variable arguments over the specified window.
19 regr_intercept() Returns the intercept of the univariate linear regression line through all non-null data pairs of the dependent and independent variable arguments over the specified window. The intercept is the point at which the regression line through the non-null data pairs in the sample intersects the ordinate, or y-axis, of the graph.
20 regr_r2() Returns the coefficient of determination for all non-null data pairs of the dependent and independent variable arguments over the specified window.
21 regr_slope()
Returns the slope of the univariate linear regression line through all non-null data pairs of the dependent and an independent variable arguments over the specified window. When function is executed, "expression" is treated as an independent variable and dependent variable is:
  • a ColumnExpression when invoked using a window created on ColumnExpression.
  • all columns of the teradataml DataFrame which are valid for this function,
when executed on a window created on teradataml DataFrame.
22 regr_sxx()
Returns the sum of the squares of the independent variable expression for all non-null data pairs of dependent and an independent variable arguments over the specified window. When function is executed, "expression" is treated as an independent variable and dependent variable is:
  • a ColumnExpression when invoked using a window created on ColumnExpression.
  • all columns of the teradataml DataFrame which are valid for this function,
when executed on a window created on teradataml DataFrame.
23 regr_sxy()
Returns the sum of the products of the independent variable and the dependent variable for all non‑null data pairs of the dependent and independent variable arguments over the specified window. When function is executed, "expression" is treated as an independent variable and dependent variable is:
  • a ColumnExpression when invoked using a window created on ColumnExpression.
  • all columns of the teradataml DataFrame which are valid for this function,
when executed on a window created on teradataml DataFrame.
24 regr_syy()
Returns the sum of the squares of the dependent variable expression for all non-null data pairs of dependent and an independent variable arguments over the specified window. When function is executed, "expression" is treated as an independent variable and dependent variable is:
  • a ColumnExpression when invoked using a window created on ColumnExpression.
  • all columns of the teradataml DataFrame which are valid for this function,
when executed on a window created on teradataml DataFrame.
25 row_number() Returns the sequential row number, starting with first row as number one, for all the rows in a teradataml DataFrame or ColumnExpression, according to "order_columns", over the specified window.
26 std()

Returns the standard deviation for the non-null data points in a teradataml DataFrame or ColumnExpression over the specified window.

The standard deviation is the second moment of either a sample or population. For a population, it is a measure of dispersion from the mean of that population. For a sample, it is a measure of dispersion from the mean of that sample. The computation is more conservative for the population standard deviation to minimize the effect of outliers on the computed value.

27 sum() Returns the sum of values in a teradataml DataFrame or ColumnExpression over the specified window.
28 var()

Returns the variance for the data points in a teradataml DataFrame or ColumnExpression over the specified window.

By default calculates the variance of sample. Variance of a sample is a measure of dispersion from the mean of that sample. It is the square of the sample standard deviation. However, if parameter "population" is True, then the function calculates the variance of a population. Variance of a population is a measure of dispersion from the mean of that population.

The computation is more conservative than that for the population standard deviation to minimize the effect of outliers on the computed value.