| |
- corr(value_expression1, value_expression2)
- DESCRIPTION:
Function returns the Sample Pearson product moment correlation coefficient
of its arguments for all non-null data point pairs.
The Sample Pearson product moment correlation coefficient is a measure of
the linear association between variables. The boundary on the computed
coefficient ranges from -1.00 to +1.00.
Note that high correlation does not imply a causal relationship between
the variables.
The coefficient of correlation between two variables has the following four extreme values:
1. -1.00 : Association between the variables is perfectly linear, but inverse.
As the value for 'value_expression1' varies, the value for
'value_expression2' varies identically in the opposite direction.
2. 0 : Association between the variables does not exist and they are considered
to be uncorrelated.
3. +1.00 : Association between the variables is perfectly linear.
As the value for 'value_expression1' varies, the value for
'value_expression2' varies identically in the same direction.
4. NULL : Association between the variables cannot be measured because there
are no non-null data point pairs in the data used for the computation.
PARAMETERS:
value_expression1:
Required Argument.
Specifies a ColumnExpression of a numeric column or a numeric literal
to be correlated with value_expression2.
Format for the argument: '<dataframe>.<dataframe_column>.expression'.
value_expression2:
Required Argument.
Specifies a ColumnExpression of a numeric column or a numeric literal
to be correlated with value_expression1.
Format for the argument: '<dataframe>.<dataframe_column>.expression'.
NOTE:
Function accepts positional arguments only.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Example 1: Calculate the correlation between "gpa" and "admitted" columns.
# Import func from sqlalchemy to execute corr function.
>>> from sqlalchemy import func
# Create a sqlalchemy Function object.
>>> corr_func_ = func.corr(admissions_train.gpa.expression, admissions_train.admitted.expression)
>>>
# Pass the Function object as input to DataFrame.assign().
>>> df = admissions_train.assign(True, corr_gpa_admitted_=corr_func_)
>>> print(df)
corr_gpa_admitted_
0 -0.022265
>>>
# Example 2: Calculate the correlation between "gpa" and "admitted" columns
# for each level of programming.
# Note:
# When assign() is run after DataFrame.groupby(), the function ignores
# the "drop_columns" argument.
>>> admissions_train.groupby("programming").assign(corr_gpa_admitted_=corr_func_)
programming corr_gpa_admitted_
0 Beginner -0.417565
1 Advanced 0.487737
2 Novice -0.114656
>>>
|