Teradata Package for Python Function Reference | 17.10 - skew - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference
- Product
- Teradata Package for Python
- Release Number
- 17.10
- Published
- April 2022
- Language
- English (United States)
- Last Update
- 2022-08-19
- lifecycle
- previous
- Product Category
- Teradata Vantage
- teradataml.dataframe.dataframe.DataFrame.skew = skew(self, distinct=False)
- DESCRIPTION:
Returns column-wise skewness of the distribution of the dataframe.
Skewness is the third moment of a distribution. It is a measure of the asymmetry of the
distribution about its mean compared with the normal (or Gaussian) distribution.
* The normal distribution has a skewness of 0.
* Positive skewness indicates a distribution having an asymmetric tail
extending toward more positive values.
* Negative skewness indicates an asymmetric tail extending toward more negative values.
Notes:
1. This function is valid only on columns with numeric types.
2. Nulls are not included in the result computation.
3. Following conditions will produce null result:
a. Fewer than three non-null data points in the data used for the computation.
b. Standard deviation for a column is equal to 0.
PARAMETERS:
distinct:
Optional Argument.
Specifies whether to exclude duplicate values while calculating the skewness of the distribution.
Default Values: False
Types: bool
RETURNS:
teradataml DataFrame object with skew()
operation performed.
RAISES:
TeradataMLException
1. EXECUTION_FAILED - If the skew() operation fails to
generate the column-wise skew value of the dataframe.
Possible error message:
Failed to perform 'skew'. (Followed by error message)
2. TDMLDF_AGGREGATE_COMBINED_ERR - If the skew() operation
doesn't support all the columns in the dataframe.
Possible error message:
No results. Below is/are the error message(s):
All selected columns [(col2 - PERIOD_TIME), (col3 -
BLOB)] is/are unsupported for 'skew' operation.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", ["admissions_train"])
# Create teradataml dataframe.
>>> df1 = DataFrame("admissions_train")
>>> print(df1.sort('id'))
masters gpa stats programming admitted
id
1 yes 3.95 Beginner Beginner 0
2 yes 3.76 Beginner Beginner 0
3 no 3.70 Novice Beginner 1
4 yes 3.50 Beginner Novice 1
5 no 3.44 Novice Novice 0
6 yes 3.50 Beginner Advanced 1
7 yes 2.33 Novice Novice 1
8 no 3.60 Beginner Advanced 1
9 no 3.82 Advanced Advanced 1
10 no 3.71 Advanced Advanced 1
>>>
# Prints skew value of each column(with supported data types).
>>> df1.skew()
skew_id skew_gpa skew_admitted
0 0.0 -2.058969 -0.653746
>>>
#
# Using skew() as Time Series Aggregate.
#
>>> # Load the example datasets.
... load_example_data("dataframe", ["ocean_buoys"])
>>>
>>> # Create the required DataFrames.
... # DataFrame on non-sequenced PTI table
... ocean_buoys = DataFrame("ocean_buoys")
>>> # Check DataFrame columns and let's peek at the data
... ocean_buoys.columns
['buoyid', 'TD_TIMECODE', 'temperature', 'salinity']
>>> ocean_buoys.head()
TD_TIMECODE temperature salinity
buoyid
0 2014-01-06 08:10:00.000000 100.0 55
0 2014-01-06 08:08:59.999999 NaN 55
1 2014-01-06 09:01:25.122200 77.0 55
1 2014-01-06 09:03:25.122200 79.0 55
1 2014-01-06 09:01:25.122200 70.0 55
1 2014-01-06 09:02:25.122200 71.0 55
1 2014-01-06 09:03:25.122200 72.0 55
0 2014-01-06 08:09:59.999999 99.0 55
0 2014-01-06 08:00:00.000000 10.0 55
0 2014-01-06 08:10:00.000000 10.0 55
#
# Time Series Aggregate Example 1: Executing skew() function on DataFrame created on
# non-sequenced PTI table. We will consider all rows for the
# columns while calculating the skew values.
#
# To use skew() as Time Series Aggregate we must run groupby_time() first, followed by skew().
>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="2cy",
... value_expression="buoyid", fill="NULLS")
>>> ocean_buoys_grpby1.skew().sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid skew_salinity skew_temperature
0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 None 0.000324
1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 None 0.000000
2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 None 0.000000
3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 None 0.246084
>>>
#
# Time Series Aggregate Example 2: Executing skew() function on DataFrame created on
# non-sequenced PTI table. We will consider DISTINCT values for the
# columns while calculating the skew value.
#
# To use skew() as Time Series Aggregate we must run groupby_time() first, followed by skew().
>>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="2cy",
... value_expression="buoyid", fill="NULLS")
>>> ocean_buoys_grpby1.skew(distinct = True).sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid skew_salinity skew_temperature
0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 None -1.731321
1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 None 0.000000
2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 None 0.000000
3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 None -1.987828
>>>