Teradata Package for Python Function Reference | 17.10 - covar_samp - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference
- Product
- Teradata Package for Python
- Release Number
- 17.10
- Published
- April 2022
- Language
- English (United States)
- Last Update
- 2022-08-19
- lifecycle
- previous
- Product Category
- Teradata Vantage
- teradataml.dataframe.window.covar_samp = covar_samp(expression)
- DESCRIPTION:
Function returns the sample covariance of its arguments for all
non-null data point pairs over the specified window.
Covariance measures whether or not two random variables vary in the
same way. It is the average of the products of deviations for each
non-null data point pair. The function considers ColumnExpression as
one variable and "expression" as another variable for calculating sample
covariance.
Notes:
1. When there are no non-null data point pairs in the data used for
the computation, the function returns None.
2. High covariance does not imply a causal relationship between
the variables.
PARAMETERS:
expression:
Required Argument.
Specifies a ColumnExpression of a numeric column or name of the column
or a numeric literal to be paired with another variable to determine
their sample covariance.
Types: ColumnExpression OR int OR float OR str
RETURNS:
* teradataml DataFrame - When aggregate is executed using window created
on teradataml DataFrame.
* ColumnExpression, also known as, teradataml DataFrameColumn - When aggregate is
executed using window created on ColumnExpression.
RAISES:
RuntimeError - If column does not support the aggregate operation.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Note:
# In the examples here, ColumnExpression is passed as input. User can
# choose to pass column name instead of the ColumnExpression.
# Example 1: Calculate the sample covariance for 'gpa' and 'admitted'
# in a Rolling window, partitioned over 'programming'.
# Create a Rolling window on 'gpa'.
>>> window = admissions_train.gpa.window(partition_columns="programming",
... window_start_point=-2,
... window_end_point=0)
>>>
# Execute covar_samp() on the Rolling window and attach it to the teradataml DataFrame.
# Note: DataFrame.assign() allows combining multiple window aggregate operations
# in one single call. In this example, we are executing covar_samp() along with
# max() window aggregate operations.
>>> df = admissions_train.assign(covar_pop_gpa=window.covar_pop(admissions_train.admitted),
... max_gpa=window.max())
>>> df
masters gpa stats programming admitted covar_samp_gpa max_gpa
id
15 yes 4.00 Advanced Advanced 1 0.000000 4.00
16 no 3.70 Advanced Advanced 1 0.000000 4.00
11 no 3.13 Advanced Advanced 1 0.000000 3.96
9 no 3.82 Advanced Advanced 1 0.000000 3.82
19 yes 1.98 Advanced Advanced 0 0.560000 3.82
27 yes 3.96 Advanced Advanced 0 0.176667 3.96
1 yes 3.95 Beginner Beginner 0 NaN 3.95
34 yes 3.85 Advanced Beginner 0 0.000000 3.95
32 yes 3.46 Advanced Beginner 0 0.000000 3.95
40 yes 3.95 Novice Beginner 0 0.000000 3.95
>>>
# Example 2: Calculate covariance sample between all the valid columns
# and 'admitted' in teradataml DataFrame, in an Expanding window,
# partitioned over 'masters', and order by 'id'.
# Create an Expanding window on teradataml DataFrame.
>>> window = admissions_train.window(partition_columns="masters",
... order_columns="id",
... window_start_point=None,
... window_end_point=0)
>>>
# Execute covar_samp() on Expanding window.
>>> df = window.covar_samp(admissions_train.admitted)
>>> df
masters gpa stats programming admitted admitted_covar_samp gpa_covar_samp id_covar_samp
id
4 yes 3.50 Beginner Novice 1 0.333333 -0.118333 0.833333
7 yes 2.33 Novice Novice 1 0.300000 -0.223500 1.250000
14 yes 3.45 Advanced Advanced 0 0.300000 -0.183000 0.000000
15 yes 4.00 Advanced Advanced 1 0.285714 -0.110714 0.666667
19 yes 1.98 Advanced Advanced 0 0.277778 0.039722 0.277778
20 yes 3.90 Advanced Advanced 1 0.266667 0.059111 0.711111
3 no 3.70 Novice Beginner 1 NaN NaN NaN
5 no 3.44 Novice Novice 0 0.500000 0.130000 -1.000000
8 no 3.60 Beginner Advanced 1 0.333333 0.070000 0.166667
9 no 3.82 Advanced Advanced 1 0.250000 0.066667 0.416667
>>>
# Example 3: Calculate covariance sample between all the valid columns
# and 'admitted' in a teradataml DataFrame, which are grouped by
# 'masters', 'admitted' and 'gpa' in a Contracting window,
# partitioned over 'masters'.
# Perform group_by() operation on teradataml DataFrame.
>>> group_by_df = admissions_train.groupby(["masters", "admitted", "gpa"])
# Create a Contracting window on teradataml DataFrameGroupBy object.
>>> window = group_by_df.window(partition_columns="masters",
... window_start_point=-5,
... window_end_point=None)
# Execute covar_samp() on Contracting window.
>>> window.covar_samp(admissions_train.admitted)
masters admitted gpa admitted_covar_samp gpa_covar_samp
0 yes 1 3.81 0.285714 -0.212857
1 yes 1 2.65 0.266667 -0.192444
2 yes 0 3.45 0.254545 -0.170727
3 yes 0 3.95 0.242424 -0.108485
4 yes 0 3.75 0.247253 -0.079286
5 yes 0 3.79 0.257143 -0.050571
6 no 1 3.65 0.000000 0.000000
7 no 1 3.87 0.000000 0.000000
8 no 1 3.71 0.125000 0.048929
9 no 1 3.93 0.194444 0.118889
>>>