Regular Aggregate Functions | DataFrame | Teradata Package for Python - Regular Aggregate Functions Supported by DataFrame - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

teradataml DataFrame supports following set of regular aggregate functions which can be used with and without DataFrame.groupby().

See the DataFrame Aggregate Functions section of Teradata Package for Python Function Reference on VantageCloud Lake, B700-4500) at https://docs.teradata.com/ for detailed description and usage examples of these functions.

Sr. No. Function Name Description
1 corr() Returns the Sample Pearson product moment correlation coefficient of its arguments for all non-null data point pairs.
2 count() Returns column-wise count of the dataframe.
3 covar_pop()

Returns the column-wise population covariance of its arguments for all non-null data point pairs.

Covariance measures whether or not two random variables vary in the same way. It is the average of the products of deviations for each non-null data point pair.

4 covar_samp()

Returns the column-wise sample covariance of its arguments for all non-null data point pairs.

Covariance measures whether or not two random variables vary in the same way. It is the average of the products of deviations for each non-null data point pair.

5 kurtosis()

Returns column-wise kurtosis value of the dataframe.

Kurtosis is the fourth moment of the distribution of the standardized (z) values.

It is a measure of the outlier (rare, extreme observation) character of the distribution as compared with the normal (or Gaussian) distribution.

  • The normal distribution has a kurtosis of 0.
  • Positive kurtosis indicates that the distribution is more outlier-prone than the normal distribution.
  • Negative kurtosis indicates that the distribution is less outlier-prone than the normal distribution.
6 max() Returns column-wise maximum value of the dataframe.
7 mean() Returns column-wise mean value of the dataframe.
8 median() Returns column-wise median value of the dataframe.
9 min() Returns column-wise minimum value of the dataframe.
10 percentile() Return the value which represents the desired percentile.
11 regr_avgx() Returns the column-wise mean of the independent variable for all non-null data pairs of the dependent and an independent variable arguments.
12 regr_avgy() Returns the column-wise mean of the dependent variable for all non-null data pairs of the dependent and independent variable arguments.
13 regr_count() Returns the column-wise count of all non-null data pairs of the dependent and independent variable arguments.
14 regr_intercept()

Returns the column-wise intercept of the univariate linear regression line through all non-null data pairs of the dependent and independent variable arguments.

The intercept is the point at which the regression line through the non-null data pairs in the sample intersects the ordinate, or y-axis, of the graph.

15 regr_r2() Returns the column-wise coefficient of determination for all non-null data pairs of the dependent and independent variable arguments.
16 regr_slope() Returns the column-wise coefficient slope of the univariate linear regression line through all non-null data pairs of the dependent and an independent variable arguments.
17 regr_sxx() Returns the column-wise sum of the squares of the independent variable expression for all non-null data pairs of dependent and an independent variable arguments.
18 regr_sxy() Returns the column-wise sum of the products of the independent variable and the dependent variable for all non‑null data pairs of the dependent and independent variable arguments.
19 regr_syy() Returns the column-wise sum of the squares of the dependent variable expression for all non-null data pairs of dependent and an independent variable arguments.
20 skew()

Returns column-wise skewness of the distribution of the dataframe.

Skewness is the third moment of a distribution. It is a measure of the asymmetry of the distribution about its mean compared with the normal (or Gaussian) distribution.

  • The normal distribution has a skewness of 0.
  • Positive skewness indicates a distribution having an asymmetric tail extending toward more positive values.
  • Negative skewness indicates an asymmetric tail extending toward more negative values.
21 std()

Returns column-wise sample or population standard deviation value of the dataframe. The standard deviation is the second moment of a distribution.

  • For a sample, it is a measure of dispersion from the mean of that sample.
  • For a population, it is a measure of dispersion from the mean of that population.

The computation is more conservative for the population standard deviation to minimize the effect of outliers on the computed value.

22 sum() Returns column-wise sum value of the dataframe.
23 var()

Returns column-wise sample or population variance of the columns in a dataframe.

  • The variance of a population is a measure of dispersion from the mean of that population.
  • The variance of a sample is a measure of dispersion from the mean of that sample. It is the square of the sample standard deviation.
24 agg() Perform aggregates using one or more operations.
teradataml special aggregate functions
25 csum() Returns column-wise cumulative sum value for rows in the partition of the dataframe.
26 msum() Computes the moving sum for the current row and the preceding "width"-1 rows in a partition, by sorting the rows according to "sort_columns".
27 mavg() Computes the moving average for the current row and the preceding "width"-1 rows in a partition, by sorting the rows according to "sort_columns".
28 mdiff() Computes the moving difference for the current row and the preceding "width" rows in a partition, by sorting the rows according to "sort_columns".
29 mlinreg() Computes the moving linear regression for the current row and the preceding "width"-1 rows in a partition, by sorting the rows according to "sort_columns".