| |
- covar_pop(value_expression1, value_expression2)
- DESCRIPTION:
Function returns the population covariance of its arguments for all
non-null data point pairs.
Covariance measures whether or not two random variables vary in the
same way. It is the average of the products of deviations for each
non-null data point pair.
Note that high covariance does not imply a causal relationship between
the variables.
When there are no non-null data point pairs in the data used for the
computation, the function returns NULL.
PARAMETERS:
value_expression1:
Required Argument.
Specifies a ColumnExpression of a numeric column or a numeric literal
to be paired with value_expression2 to determine their population covariance.
Format for the argument: '<dataframe>.<dataframe_column>.expression'.
value_expression2:
Required Argument.
Specifies a ColumnExpression of a numeric column or a numeric literal
to be paired with value_expression1 to determine their population covariance.
Format for the argument: '<dataframe>.<dataframe_column>.expression'.
NOTE:
Function accepts positional arguments only.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Example 1: Calculate the population covariance between "gpa" and "admitted" columns.
# Import func from sqlalchemy to execute covar_pop function.
>>> from sqlalchemy import func
# Create a sqlalchemy Function object.
>>> covar_pop_func_ = func.covar_pop(admissions_train.gpa.expression, admissions_train.admitted.expression)
>>>
# Pass the Function object as input to DataFrame.assign().
>>> df = admissions_train.assign(True, covar_pop_gpa_admitted_=covar_pop_func_)
>>> print(df)
covar_pop_gpa_admitted_
0 -0.005388
>>>
# Example 2: Calculate the population covariance between "gpa" and "admitted" columns
# for each level of programming.
# Note:
# When assign() is run after DataFrame.groupby(), the function ignores
# the "drop_columns" argument.
>>> admissions_train.groupby("programming").assign(covar_pop_gpa_admitted_=covar_pop_func_)
programming covar_pop_gpa_admitted_
0 Beginner -0.069231
1 Advanced 0.091055
2 Novice -0.031488
>>>
|