Teradata Package for Python Function Reference | 20.00 - covar_pop - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- lifecycle
- latest
- Product Category
- Teradata Vantage
- teradataml.dataframe.window.covar_pop = covar_pop(expression)
- DESCRIPTION:
Function returns the population covariance of its arguments for all
non-null data point pairs over the specified window.
Covariance measures whether or not two random variables vary in the
same way. It is the average of the products of deviations for each
non-null data point pair. The function considers ColumnExpression
as one variable and "expression" as another variable for calculating
population covariance.
Notes:
1. When there are no non-null data point pairs in the data used for
the computation, the function returns None.
2. High covariance does not imply a causal relationship between
the variables.
PARAMETERS:
expression:
Required Argument.
Specifies a ColumnExpression of a numeric column or name of the column
or a numeric literal to be paired with another variable to determine
their population covariance.
Types: ColumnExpression OR int OR float OR str
RETURNS:
* teradataml DataFrame - When aggregate is executed using window created
on teradataml DataFrame.
* ColumnExpression, also known as, teradataml DataFrameColumn - When aggregate is
executed using window created on ColumnExpression.
RAISES:
RuntimeError - If column does not support the aggregate operation.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Note:
# In the examples here, ColumnExpression is passed as input. User can
# choose to pass column name instead of the ColumnExpression.
# Example 1: Calculate the population covariance for 'gpa' and 'admitted'
# in a Rolling window, partitioned over 'programming'.
# Create a Rolling window on 'gpa'.
>>> window = admissions_train.gpa.window(partition_columns=admissions_train.programming,
... window_start_point=-2,
... window_end_point=0)
>>>
# Execute covar_pop() on the Rolling window and attach it to the teradataml DataFrame.
# Note: DataFrame.assign() allows combining multiple window aggregate operations
# in one single call. In this example, we are executing covar_pop() along with
# max() window aggregate operations.
>>> df = admissions_train.assign(covar_pop_gpa=window.covar_pop(admissions_train.admitted),
... max_gpa=window.max())
>>> df
masters gpa stats programming admitted covar_pop_gpa max_gpa
id
15 yes 4.00 Advanced Advanced 1 0.000000 4.00
16 no 3.70 Advanced Advanced 1 0.000000 4.00
11 no 3.13 Advanced Advanced 1 0.000000 3.96
9 no 3.82 Advanced Advanced 1 0.000000 3.82
19 yes 1.98 Advanced Advanced 0 0.373333 3.82
27 yes 3.96 Advanced Advanced 0 0.117778 3.96
1 yes 3.95 Beginner Beginner 0 0.000000 3.95
34 yes 3.85 Advanced Beginner 0 0.000000 3.95
32 yes 3.46 Advanced Beginner 0 0.000000 3.95
40 yes 3.95 Novice Beginner 0 0.000000 3.95
>>>
# Example 2: Calculate covariance population between 'admitted' and all the
# valid columns in teradataml DataFrame, in an Expanding window,
# partitioned over 'masters', and order by 'id'.
# Create an Expanding window on teradataml DataFrame.
>>> window = admissions_train.window(partition_columns=admissions_train.masters,
... order_columns=admissions_train.id,
... window_start_point=None,
... window_end_point=0)
>>>
# Execute covar_pop() on Expanding window.
>>> df = window.covar_pop(admissions_train.admitted)
>>> df
masters gpa stats programming admitted admitted_covar_pop gpa_covar_pop id_covar_pop
id
4 yes 3.50 Beginner Novice 1 0.222222 -0.078889 0.555556
7 yes 2.33 Novice Novice 1 0.240000 -0.178800 1.000000
14 yes 3.45 Advanced Advanced 0 0.250000 -0.152500 0.000000
15 yes 4.00 Advanced Advanced 1 0.244898 -0.094898 0.571429
19 yes 1.98 Advanced Advanced 0 0.246914 0.035309 0.246914
20 yes 3.90 Advanced Advanced 1 0.240000 0.053200 0.640000
3 no 3.70 Novice Beginner 1 0.000000 0.000000 0.000000
5 no 3.44 Novice Novice 0 0.250000 0.065000 -0.500000
8 no 3.60 Beginner Advanced 1 0.222222 0.046667 0.111111
9 no 3.82 Advanced Advanced 1 0.187500 0.050000 0.312500
>>>
# Example 3: Calculate covariance population between 'admitted' and all the
# valid columns in a teradataml DataFrame, which are grouped by
# 'masters', 'admitted' and 'gpa' in a Contracting window,
# partitioned over 'masters'.
# Perform group_by() operation on teradataml DataFrame.
>>> group_by_df = admissions_train.groupby(["masters", "admitted", "gpa"])
# Create a Contracting window on teradataml DataFrameGroupBy object.
>>> window = group_by_df.window(partition_columns=group_by_df.masters,
... window_start_point=-5,
... window_end_point=None)
# Execute covar_pop() on Contracting window.
>>> window.covar_pop(admissions_train.admitted)
masters admitted gpa admitted_covar_pop gpa_covar_pop
0 yes 1 3.50 0.250000 0.021875
1 yes 1 3.59 0.250000 -0.063500
2 yes 1 4.00 0.247934 -0.074628
3 yes 1 2.65 0.243056 -0.071250
4 yes 0 3.95 0.244898 -0.052449
5 yes 1 2.33 0.248889 -0.036178
6 no 1 3.65 0.000000 0.000000
7 no 1 3.87 0.000000 0.000000
8 no 1 3.71 0.000000 0.000000
9 no 1 3.93 0.000000 0.000000
>>>