| |
- regr_avgx(dependent_variable_expression, independent_variable_expression)
- DESCRIPTION:
Function returns the mean of the independent_variable_expression for all
non-null data pairs of the dependent and independent variable arguments.
When there are fewer than two non-null data point pairs in the data used
for the computation, the function returns NULL.
PARAMETERS:
dependent_variable_expression:
Required Argument.
Specifies a ColumnExpression of a column or a literal representing a
dependent variable for the regression.
A dependent variable is something that is measured in response to a treatment.
Format for the argument: '<dataframe>.<dataframe_column>.expression'.
independent_variable_expression:
Required Argument.
Specifies a ColumnExpression of a column or a literal representing an
independent variable for the regression.
An independent variable is a treatment: something that is varied under
your control to test the behavior of another variable.
Format for the argument: '<dataframe>.<dataframe_column>.expression'.
NOTE:
Function accepts positional arguments only.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Example 1: Calculate the mean of the "gpa" column (independent variable) with
# respect to values in "admitted" column (dependent variable).
# Import func from sqlalchemy to execute regr_avgx function.
>>> from sqlalchemy import func
# Create a sqlalchemy Function object.
>>> regr_avgx_func_ = func.regr_avgx(admissions_train.admitted.expression, admissions_train.gpa.expression)
>>>
# Pass the Function object as input to DataFrame.assign().
>>> df = admissions_train.assign(True, regr_avgx_=regr_avgx_func_)
>>> print(df)
regr_avgx_
0 3.54175
>>>
# Example 2: Calculate the mean of the "gpa" column (independent variable) with
# respect to values in "admitted" column (dependent variable) for each
# level of programming.
# Note:
# When assign() is run after DataFrame.groupby(), the function ignores
# the "drop_columns" argument.
>>> admissions_train.groupby("programming").assign(regr_avgx_=regr_avgx_func_)
programming regr_avgx_
0 Beginner 3.660000
1 Advanced 3.615625
2 Novice 3.294545
>>>
|