teradataml DataFrameColumn supports the following set of regular aggregate functions which can be used with and without DataFrame.groupby().
- You must use DataFrame.assign() when using the aggregate functions on ColumnExpression, also known as, teradataml DataFrameColumn.
- You should always use "drop_columns=True" in DataFrame.assign() while running the aggregate operation on teradataml DataFrame.
- drop_columns argument in DataFrame.assign() is ignored, when aggregate function is operated on DataFrame.groupby().
See the DataFrameColumn Aggregate Functions section of Teradata Package for Python Function Reference, B700-4008) at https://docs.teradata.com/ for detailed description and usage examples of these functions.
| Sr. No. | Function Name | Description |
|---|---|---|
| 1 | corr() | Returns the Sample Pearson product moment correlation coefficient of its arguments for all non-null data point pairs. |
| 2 | count() | Returns column-wise count of the ColumnExpression, also known as, teradataml DataFrameColumn. |
| 3 | covar_pop() | Returns the column-wise population covariance of its arguments for all non-null data point pairs. Covariance measures whether or not two random variables vary in the same way. It is the average of the products of deviations for each non-null data point pair. |
| 4 | covar_samp() | Returns the column-wise sample covariance of its arguments for all non-null data point pairs. Covariance measures whether or not two random variables vary in the same way. It is the average of the products of deviations for each non-null data point pair. |
| 5 | kurtosis() | Returns column-wise kurtosis value of the ColumnExpression, also known as, teradataml DataFrameColumn. Kurtosis is the fourth moment of the distribution of the standardized (z) values. It is a measure of the outlier (rare, extreme observation) character of the distribution as compared with the normal (or Gaussian) distribution.
|
| 6 | max() | Returns column-wise maximum value of the ColumnExpression, also known as, teradataml DataFrameColumn. |
| 7 | mean() | Returns column-wise mean value of the ColumnExpression, also known as, teradataml DataFrameColumn. |
| 8 | median() | Returns column-wise median value of the ColumnExpression, also known as, teradataml DataFrameColumn. |
| 9 | min() | Returns column-wise minimum value of the ColumnExpression, also known as, teradataml DataFrameColumn. |
| 10 | percentile() | Return the value which represents the desired percentile for the ColumnExpression, also known as, teradataml DataFrameColumn. |
| 11 | regr_avgx() | Returns the column-wise mean of the independent variable for all non-null data pairs of the dependent and an independent variable arguments. |
| 12 | regr_avgy() | Returns the column-wise mean of the dependent variable for all non-null data pairs of the dependent and independent variable arguments. |
| 13 | regr_count() | Returns the column-wise count of all non-null data pairs of the dependent and independent variable arguments. |
| 14 | regr_intercept() | Returns the column-wise intercept of the univariate linear regression line through all non-null data pairs of the dependent and independent variable arguments. The intercept is the point at which the regression line through the non-null data pairs in the sample intersects the ordinate, or y-axis, of the graph. |
| 15 | regr_r2() | Returns the column-wise coefficient of determination for all non-null data pairs of the dependent and independent variable arguments. |
| 16 | regr_slope() | Returns the column-wise coefficient slope of the univariate linear regression line through all non-null data pairs of the dependent and an independent variable arguments. |
| 17 | regr_sxx() | Returns the column-wise sum of the squares of the independent variable expression for all non-null data pairs of dependent and an independent variable arguments. |
| 18 | regr_sxy() | Returns the column-wise sum of the products of the independent variable and the dependent variable for all non‑null data pairs of the dependent and independent variable arguments. |
| 19 | regr_syy() | Returns the column-wise sum of the squares of the dependent variable expression for all non-null data pairs of dependent and an independent variable arguments. |
| 20 | skew() | Returns column-wise skewness of the distribution of the ColumnExpression, also known as, teradataml DataFrameColumn. Skewness is the third moment of a distribution. It is a measure of the asymmetry of the distribution about its mean compared with the normal (or Gaussian) distribution.
|
| 21 | std() | Returns column-wise sample or population standard deviation value of the ColumnExpression, also known as, teradataml DataFrameColumn. The standard deviation is the second moment of a distribution.
The computation is more conservative for the population standard deviation to minimize the effect of outliers on the computed value. |
| 22 | sum() | Returns column-wise sum value of the ColumnExpression, also known as, teradataml DataFrameColumn. |
| 23 | var() | Returns column-wise sample or population variance of the columns of the ColumnExpression, also known as, teradataml DataFrameColumn.
|
| teradataml special aggregate functions | ||
| 24 | csum() | Returns cumulative sum value for rows in the partition of the column. |
| 25 | msum() | Computes the moving sum for the current row and the preceding "width"-1 rows in a partition, by sorting the rows according to "sort_columns". |
| 26 | mavg() | Computes the moving average for the current row and the preceding "width"-1 rows in a partition, by sorting the rows according to "sort_columns". |
| 27 | mdiff() | Computes the moving difference for the current row and the preceding "width" rows in a partition, by sorting the rows according to "sort_columns". |
| 28 | mlinreg() | Computes the moving linear regression for the current row and the preceding "width"-1 rows in a partition, by sorting the rows according to "sort_columns". |