The aggregate function ts.kurtosis() measures the tailedness of the probability distribution of a column in each group.
Kurtosis is the fourth moment of the distribution of the standardized (z) values. It is a measure of the outlier (rare, extreme observation) character of the distribution as compared to the normal (Gaussian) distribution.
- The normal distribution has a kurtosis of 0.
- Positive kurtosis indicates that the distribution is more outlier-prone (deviation from the mean) than the normal distribution.
- Negative kurtosis indicates that the distribution is less outlier-prone (deviation from the mean) than the normal distribution.
- This function is valid only on columns with numeric types.
- Nulls are not included in the result computation.
- Following conditions produce NULL result:
- Fewer than three non-NULL data points in the data used for the computation.
- Standard deviation for a column is equal to 0.
Arguments:
- value.expression: Specify the column for which kurtosis is to be computed.
Use ts.kurtosis(distinct(column_name)) to exclude duplicate rows while calculating kurtosis values.
Example 1: Calculate the Kurtosis of the 'temperature' column of sequenced PTI table
- Calculate the kurtosis.
> df_seq_kurtosis <- df_seq_grp %>% summarise(kurtosis_temp = ts.kurtosis(temperature))
- Print the results.
> df_seq_kurtosis %>% arrange(TIMECODE_RANGE, buoyid, kurtosis_temp) # Source: lazy query [?? x 4] # Database: [Teradata 16.20.50.01] [Teradata Native Driver 17.0.0.2] # [TDAPUSER@<hostname>/TDAPUSERDB] # Ordered by: TIMECODE_RANGE, buoyid, kurtosis_temp TIMECODE_RANGE `GROUP BY TIME(MINUTES~ buoyid kurtosis_temp <chr> <int64> <int> <dbl> 1 2014-01-06 08:00:00.000000+00:00,2014-01-06 0~ 35345 0 -6.00 2 2014-01-06 09:00:00.000000+00:00,2014-01-06 0~ 35347 1 -2.76 3 2014-01-06 10:00:00.000000+00:00,2014-01-06 1~ 35349 44 -2.31 4 2014-01-06 10:30:00.000000+00:00,2014-01-06 1~ 35350 22 NA 5 2014-01-06 10:30:00.000000+00:00,2014-01-06 1~ 35350 44 NA 6 2014-01-06 21:00:00.000000+00:00,2014-01-06 2~ 35371 2 NA
Example 2: Calculate the Kurtosis of the 'temperature' column of non-PTI table
- Calculate the kurtosis.
> df_nonpti_kurtosis <- df_nonpti %>% group_by_time(timebucket.duration = "10m", timecode.column = "TIMECODE") %>% summarise(kurtosis_temp = ts.kurtosis(temperature))
- Print the results.
> df_nonpti_kurtosis %>% arrange(TIMECODE_RANGE, kurtosis_temp) # Source: lazy query [?? x 3] # Database: [Teradata 16.20.50.01] [Teradata Native Driver 17.0.0.2] # [TDAPUSER@<hostname>/TDAPUSERDB] # Ordered by: TIMECODE_RANGE, kurtosis_temp TIMECODE_RANGE `GROUP BY TIME(MINUTES(~ kurtosis_temp <chr> <int64> <dbl> 1 2014-01-06 08:00:00.000000+00:00,2014-01-06 08:10:0~ 2314993 NA 2 2014-01-06 08:10:00.000000+00:00,2014-01-06 08:20:0~ 2314994 NA 3 2014-01-06 09:00:00.000000+00:00,2014-01-06 09:10:0~ 2314999 -2.76 4 2014-01-06 10:00:00.000000+00:00,2014-01-06 10:10:0~ 2315005 -2.18 5 2014-01-06 10:10:00.000000+00:00,2014-01-06 10:20:0~ 2315006 NA 6 2014-01-06 10:30:00.000000+00:00,2014-01-06 10:40:0~ 2315008 NA 7 2014-01-06 10:50:00.000000+00:00,2014-01-06 11:00:0~ 2315010 NA 8 2014-01-06 21:00:00.000000+00:00,2014-01-06 21:10:0~ 2315071 NA