Regular Aggregate Functions | DataFrame Column | Teradata Package for Python - Regular Aggregate Functions Supported by DataFrame Column - Teradata Package for Python

Teradata® Package for Python User Guide

Product
Teradata Package for Python
Release Number
17.00
Published
November 2021
Language
English (United States)
Last Update
2022-01-14
dita:mapPath
bol1585763678431.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage

teradataml DataFrameColumn supports the following set of regular aggregate functions which can be used with and without DataFrame.groupby().

  • You must use DataFrame.assign() when using the aggregate functions on ColumnExpression, also known as, teradataml DataFrameColumn.
  • You should always use "drop_columns=True" in DataFrame.assign() while running the aggregate operation on teradataml DataFrame.
  • drop_columns argument in DataFrame.assign() is ignored, when aggregate function is operated on DataFrame.groupby().

See the DataFrameColumn Aggregate Functions section of Teradata Package for Python Function Reference, B700-4008) at https://docs.teradata.com/ for detailed description and usage examples of these functions.

Regular Aggregate Functions supported by DataFrame Column
Sr. No. Function Name Description
1 corr() Returns the Sample Pearson product moment correlation coefficient of its arguments for all non-null data point pairs.
2 count() Returns column-wise count of the ColumnExpression, also known as, teradataml DataFrameColumn.
3 covar_pop()

Returns the column-wise population covariance of its arguments for all non-null data point pairs.

Covariance measures whether or not two random variables vary in the same way. It is the average of the products of deviations for each non-null data point pair.

4 covar_samp()

Returns the column-wise sample covariance of its arguments for all non-null data point pairs.

Covariance measures whether or not two random variables vary in the same way. It is the average of the products of deviations for each non-null data point pair.

5 kurtosis()

Returns column-wise kurtosis value of the ColumnExpression, also known as, teradataml DataFrameColumn.

Kurtosis is the fourth moment of the distribution of the standardized (z) values.

It is a measure of the outlier (rare, extreme observation) character of the distribution as compared with the normal (or Gaussian) distribution.

  • The normal distribution has a kurtosis of 0.
  • Positive kurtosis indicates that the distribution is more outlier-prone than the normal distribution.
  • Negative kurtosis indicates that the distribution is less outlier-prone than the normal distribution.
6 max() Returns column-wise maximum value of the ColumnExpression, also known as, teradataml DataFrameColumn.
7 mean() Returns column-wise mean value of the ColumnExpression, also known as, teradataml DataFrameColumn.
8 median() Returns column-wise median value of the ColumnExpression, also known as, teradataml DataFrameColumn.
9 min() Returns column-wise minimum value of the ColumnExpression, also known as, teradataml DataFrameColumn.
10 percentile() Return the value which represents the desired percentile for the ColumnExpression, also known as, teradataml DataFrameColumn.
11 regr_avgx() Returns the column-wise mean of the independent variable for all non-null data pairs of the dependent and an independent variable arguments.
12 regr_avgy() Returns the column-wise mean of the dependent variable for all non-null data pairs of the dependent and independent variable arguments.
13 regr_count() Returns the column-wise count of all non-null data pairs of the dependent and independent variable arguments.
14 regr_intercept()

Returns the column-wise intercept of the univariate linear regression line through all non-null data pairs of the dependent and independent variable arguments.

The intercept is the point at which the regression line through the non-null data pairs in the sample intersects the ordinate, or y-axis, of the graph.

15 regr_r2() Returns the column-wise coefficient of determination for all non-null data pairs of the dependent and independent variable arguments.
16 regr_slope() Returns the column-wise coefficient slope of the univariate linear regression line through all non-null data pairs of the dependent and an independent variable arguments.
17 regr_sxx() Returns the column-wise sum of the squares of the independent variable expression for all non-null data pairs of dependent and an independent variable arguments.
18 regr_sxy() Returns the column-wise sum of the products of the independent variable and the dependent variable for all non‑null data pairs of the dependent and independent variable arguments.
19 regr_syy() Returns the column-wise sum of the squares of the dependent variable expression for all non-null data pairs of dependent and an independent variable arguments.
20 skew()

Returns column-wise skewness of the distribution of the ColumnExpression, also known as, teradataml DataFrameColumn.

Skewness is the third moment of a distribution. It is a measure of the asymmetry of the distribution about its mean compared with the normal (or Gaussian) distribution.
  • The normal distribution has a skewness of 0.
  • Positive skewness indicates a distribution having an asymmetric tail extending toward more positive values.
  • Negative skewness indicates an asymmetric tail extending toward more negative values.
21 std()

Returns column-wise sample or population standard deviation value of the ColumnExpression, also known as, teradataml DataFrameColumn. The standard deviation is the second moment of a distribution.

  • For a sample, it is a measure of dispersion from the mean of that sample.
  • For a population, it is a measure of dispersion from the mean of that population.

The computation is more conservative for the population standard deviation to minimize the effect of outliers on the computed value.

22 sum() Returns column-wise sum value of the ColumnExpression, also known as, teradataml DataFrameColumn.
23 var()

Returns column-wise sample or population variance of the columns of the ColumnExpression, also known as, teradataml DataFrameColumn.

  • The variance of a population is a measure of dispersion from the mean of that population.
  • The variance of a sample is a measure of dispersion from the mean of that sample. It is the square of the sample standard deviation.
teradataml special aggregate functions
24 csum() Returns cumulative sum value for rows in the partition of the column.
25 msum() Computes the moving sum for the current row and the preceding "width"-1 rows in a partition, by sorting the rows according to "sort_columns".
26 mavg() Computes the moving average for the current row and the preceding "width"-1 rows in a partition, by sorting the rows according to "sort_columns".
27 mdiff() Computes the moving difference for the current row and the preceding "width" rows in a partition, by sorting the rows according to "sort_columns".
28 mlinreg() Computes the moving linear regression for the current row and the preceding "width"-1 rows in a partition, by sorting the rows according to "sort_columns".