| |
- skew(value_expression)
- DESCRIPTION:
Function returns the skewness of the distribution of value_expression.
Skewness is the third moment of a distribution. It is a measure of the
asymmetry of the distribution about its mean compared with the normal
(or Gaussian) distribution.
* The normal distribution has a skewness of 0.
* Positive skewness indicates a distribution having an asymmetric tail
extending toward more positive values.
* Negative skewness indicates an asymmetric tail extending toward more
negative values.
PARAMETERS:
value_expression:
Required Argument.
Specifies a ColumnExpression of a numeric column for which the skewness
of the distribution of its values is to be computed.
Format for the argument: '<dataframe>.<dataframe_column>.expression'.
Notes:
1. Nulls are not included in the result computation.
2. Following conditions will produce null result:
a. Fewer than three non-null data points in the data used for the
computation.
b. Standard deviation for a column is equal to 0.
NOTE:
Function accepts positional arguments only.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Example 1: Calculate the skewness of the distribution of the "gpa" column.
# Import func from sqlalchemy to execute skew function.
>>> from sqlalchemy import func
# Create a sqlalchemy Function object.
>>> skew_func_ = func.skew(admissions_train.gpa.expression)
>>>
# Pass the Function object as input to DataFrame.assign().
>>> df = admissions_train.assign(True, skew_gpa_=skew_func_)
>>> print(df)
skew_gpa_
0 -2.058969
>>>
# Example 2: Calculate the skewness of the distribution of "gpa" column for
# each level of programming.
# Note:
# When assign() is run after DataFrame.groupby(), the function ignores
# the "drop_columns" argument.
>>> admissions_train.groupby("programming").assign(skew_gpa_=func.skew(admissions_train.gpa.expression))
programming skew_gpa_
0 Beginner -2.084085
1 Advanced -2.703078
2 Novice -1.459620
>>>
|