Teradata Package for Python Function Reference | 17.10 - covar_pop - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

teradataml.dataframe.window.covar_pop = covar_pop(expression): DESCRIPTION: Function returns the population covariance of its arguments for all non-null data point pairs over the specified window. Covariance measures whether or not two random variables vary in the same way. It is the average of the products of deviations for each non-null data point pair. The function considers ColumnExpression as one variable and "expression" as another variable for calculating population covariance. Notes: 1. When there are no non-null data point pairs in the data used for the computation, the function returns None. 2. High covariance does not imply a causal relationship between the variables. PARAMETERS: expression: Required Argument. Specifies a ColumnExpression of a numeric column or name of the column or a numeric literal to be paired with another variable to determine their population covariance. Types: ColumnExpression OR int OR float OR str RETURNS: * teradataml DataFrame - When aggregate is executed using window created on teradataml DataFrame. * ColumnExpression, also known as, teradataml DataFrameColumn - When aggregate is executed using window created on ColumnExpression. RAISES: RuntimeError - If column does not support the aggregate operation. EXAMPLES: # Load the data to run the example. >>> load_example_data("dataframe", "admissions_train") >>> # Create a DataFrame on 'admissions_train' table. >>> admissions_train = DataFrame("admissions_train") >>> admissions_train masters gpa stats programming admitted id 22 yes 3.46 Novice Beginner 0 36 no 3.00 Advanced Novice 0 15 yes 4.00 Advanced Advanced 1 38 yes 2.65 Advanced Beginner 1 5 no 3.44 Novice Novice 0 17 no 3.83 Advanced Advanced 1 34 yes 3.85 Advanced Beginner 0 13 no 4.00 Advanced Novice 1 26 yes 3.57 Advanced Advanced 1 19 yes 1.98 Advanced Advanced 0 >>> # Note: # In the examples here, ColumnExpression is passed as input. User can # choose to pass column name instead of the ColumnExpression. # Example 1: Calculate the population covariance for 'gpa' and 'admitted' # in a Rolling window, partitioned over 'programming'. # Create a Rolling window on 'gpa'. >>> window = admissions_train.gpa.window(partition_columns="programming", ... window_start_point=-2, ... window_end_point=0) >>> # Execute covar_pop() on the Rolling window and attach it to the teradataml DataFrame. # Note: DataFrame.assign() allows combining multiple window aggregate operations # in one single call. In this example, we are executing covar_pop() along with # max() window aggregate operations. >>> df = admissions_train.assign(covar_pop_gpa=window.covar_pop(admissions_train.admitted), ... max_gpa=window.max()) >>> df masters gpa stats programming admitted covar_pop_gpa max_gpa id 15 yes 4.00 Advanced Advanced 1 0.000000 4.00 16 no 3.70 Advanced Advanced 1 0.000000 4.00 11 no 3.13 Advanced Advanced 1 0.000000 3.96 9 no 3.82 Advanced Advanced 1 0.000000 3.82 19 yes 1.98 Advanced Advanced 0 0.373333 3.82 27 yes 3.96 Advanced Advanced 0 0.117778 3.96 1 yes 3.95 Beginner Beginner 0 0.000000 3.95 34 yes 3.85 Advanced Beginner 0 0.000000 3.95 32 yes 3.46 Advanced Beginner 0 0.000000 3.95 40 yes 3.95 Novice Beginner 0 0.000000 3.95 >>> # Example 2: Calculate covariance population between 'admitted' and all the # valid columns in teradataml DataFrame, in an Expanding window, # partitioned over 'masters', and order by 'id'. # Create an Expanding window on teradataml DataFrame. >>> window = admissions_train.window(partition_columns="masters", ... order_columns="id", ... window_start_point=None, ... window_end_point=0) >>> # Execute covar_pop() on Expanding window. >>> df = window.covar_pop(admissions_train.admitted) >>> df masters gpa stats programming admitted admitted_covar_pop gpa_covar_pop id_covar_pop id 4 yes 3.50 Beginner Novice 1 0.222222 -0.078889 0.555556 7 yes 2.33 Novice Novice 1 0.240000 -0.178800 1.000000 14 yes 3.45 Advanced Advanced 0 0.250000 -0.152500 0.000000 15 yes 4.00 Advanced Advanced 1 0.244898 -0.094898 0.571429 19 yes 1.98 Advanced Advanced 0 0.246914 0.035309 0.246914 20 yes 3.90 Advanced Advanced 1 0.240000 0.053200 0.640000 3 no 3.70 Novice Beginner 1 0.000000 0.000000 0.000000 5 no 3.44 Novice Novice 0 0.250000 0.065000 -0.500000 8 no 3.60 Beginner Advanced 1 0.222222 0.046667 0.111111 9 no 3.82 Advanced Advanced 1 0.187500 0.050000 0.312500 >>> # Example 3: Calculate covariance population between 'admitted' and all the # valid columns in a teradataml DataFrame, which are grouped by # 'masters', 'admitted' and 'gpa' in a Contracting window, # partitioned over 'masters'. # Perform group_by() operation on teradataml DataFrame. >>> group_by_df = admissions_train.groupby(["masters", "admitted", "gpa"]) # Create a Contracting window on teradataml DataFrameGroupBy object. >>> window = group_by_df.window(partition_columns="masters", ... window_start_point=-5, ... window_end_point=None) # Execute covar_pop() on Contracting window. >>> window.covar_pop(admissions_train.admitted) masters admitted gpa admitted_covar_pop gpa_covar_pop 0 yes 1 3.50 0.250000 0.021875 1 yes 1 3.59 0.250000 -0.063500 2 yes 1 4.00 0.247934 -0.074628 3 yes 1 2.65 0.243056 -0.071250 4 yes 0 3.95 0.244898 -0.052449 5 yes 1 2.33 0.248889 -0.036178 6 no 1 3.65 0.000000 0.000000 7 no 1 3.87 0.000000 0.000000 8 no 1 3.71 0.000000 0.000000 9 no 1 3.93 0.000000 0.000000 >>>