Teradata Package for Python Function Reference | 17.10 - kurtosis - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

teradataml.dataframe.dataframe.DataFrame.kurtosis = kurtosis(self, distinct=False): DESCRIPTION: Returns column-wise kurtosis value of the dataframe. Kurtosis is the fourth moment of the distribution of the standardized (z) values. It is a measure of the outlier (rare, extreme observation) character of the distribution as compared with the normal (or Gaussian) distribution. * The normal distribution has a kurtosis of 0. * Positive kurtosis indicates that the distribution is more outlier-prone than the normal distribution. * Negative kurtosis indicates that the distribution is less outlier-prone than the normal distribution. Notes: 1. This function is valid only on columns with numeric types. 2. Null values are not included in the result computation. 3. Following conditions will produce null result: a. Fewer than three non-null data points in the data used for the computation. b. Standard deviation for a column is equal to 0. PARAMETERS: distinct: Optional Argument. Specifies whether to exclude duplicate values while calculating the kurtosis value. Default Values: False Types: bool RETURNS: teradataml DataFrame object with kurtosis() operation performed. RAISES: TeradataMLException 1. EXECUTION_FAILED - If kurtosis() operation fails to generate the column-wise kurtosis value of the dataframe. Possible error message: Failed to perform 'kurtosis'. (Followed by error message) 2. TDMLDF_AGGREGATE_COMBINED_ERR - If the kurtosis() operation doesn't support all the columns in the dataframe. Possible error message: No results. Below is/are the error message(s): All selected columns [(col2 - PERIOD_TIME), (col3 - BLOB)] is/are unsupported for 'kurtosis' operation. EXAMPLES : # Load the data to run the example. >>> from teradataml.data.load_example_data import load_example_data >>> load_example_data("dataframe", ["admissions_train"]) # Create teradataml dataframe. >>> df1 = DataFrame("admissions_train") >>> print(df1.sort("id")) masters gpa stats programming admitted id 1 yes 3.95 Beginner Beginner 0 2 yes 3.76 Beginner Beginner 0 3 no 3.70 Novice Beginner 1 4 yes 3.50 Beginner Novice 1 5 no 3.44 Novice Novice 0 6 yes 3.50 Beginner Advanced 1 7 yes 2.33 Novice Novice 1 8 no 3.60 Beginner Advanced 1 9 no 3.82 Advanced Advanced 1 10 no 3.71 Advanced Advanced 1 >>> # Prints kurtosis value of each column >>> df1.kurtosis() kurtosis_id kurtosis_gpa kurtosis_admitted 0 -1.2 4.052659 -1.6582 >>> # # Using kurtosis() as Time Series Aggregate. # >>> # Load the example datasets. ... load_example_data("dataframe", ["ocean_buoys"]) >>> >>> # Create the required DataFrames. ... # DataFrame on non-sequenced PTI table ... ocean_buoys = DataFrame("ocean_buoys") >>> # Check DataFrame columns and let's peek at the data ... ocean_buoys.columns ['buoyid', 'TD_TIMECODE', 'temperature', 'salinity'] >>> ocean_buoys.head() TD_TIMECODE temperature salinity buoyid 0 2014-01-06 08:10:00.000000 100.0 55 0 2014-01-06 08:08:59.999999 NaN 55 1 2014-01-06 09:01:25.122200 77.0 55 1 2014-01-06 09:03:25.122200 79.0 55 1 2014-01-06 09:01:25.122200 70.0 55 1 2014-01-06 09:02:25.122200 71.0 55 1 2014-01-06 09:03:25.122200 72.0 55 0 2014-01-06 08:09:59.999999 99.0 55 0 2014-01-06 08:00:00.000000 10.0 55 0 2014-01-06 08:10:00.000000 10.0 55 # # Time Series Aggregate Example 1: Executing kurtosis() function on DataFrame created on # non-sequenced PTI table. We will consider all rows for the # columns while calculating the kurtosis values. # # To use kurtosis() as Time Series Aggregate we must run groupby_time() first, followed by kurtosis(). >>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="2cy", ... value_expression="buoyid", fill="NULLS") >>> ocean_buoys_grpby1.kurtosis().sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid kurtosis_salinity kurtosis_temperature 0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 None -5.998128 1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 None -2.758377 2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 None NaN 3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 None -2.195395 >>> # # Time Series Aggregate Example 2: Executing kurtosis() function on DataFrame created on # non-sequenced PTI table. We will consider DISTINCT values for the # columns while calculating the kurtosis value. # # To use kurtosis() as Time Series Aggregate we must run groupby_time() first, followed by kurtosis(). >>> ocean_buoys_grpby1 = ocean_buoys.groupby_time(timebucket_duration="2cy", ... value_expression="buoyid", fill="NULLS") >>> ocean_buoys_grpby1.kurtosis(distinct = True).sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid kurtosis_salinity kurtosis_temperature 0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 None NaN 1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 None -2.758377 2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 None NaN 3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 None 4.128426 >>>