ts.skew() | Teradata R Package - 17.00 - ts.skew() - Teradata R Package

Teradata® R Package User Guide

prodname
Teradata R Package
vrm_release
17.00
created_date
November 2020
category
User Guide
featnum
B700-4005-090K

The aggregate function ts.skew() measures the skewness of the distribution of a column.

Skewness is the third moment of a distribution. It is a measure of the asymmetry of the distribution about its mean compared to the normal (Gaussian) distribution.
  • The normal distribution has a skewness of 0.
  • Positive skewness indicates the distribution having an asymmetric tail extending toward more positive values.
  • Negative kurtosis indicates the distribution having an asymmetric tail extending toward more negative values.
  • This function is valid only on columns with numeric types.
  • Nulls are not included in the result computation.
  • Following conditions produce NULL result:
    • Fewer than three non-NULL data points in the data used for the computation.
    • Standard deviation for a column is equal to 0.
Arguments:
  • value.expression: Specify the column for which skew is to be computed.

Use ts.skew(distinct(column_name)) to exclude duplicate rows while calculating skew values.

Example 1: Calculate the skewness of the 'temperature' column of sequenced PTI table

  • Calculate the skewness.
    > df_seq_skew <- df_seq_grp %>% summarise(skew_temp = ts.skew(temperature))
  • Print the results.
    > df_seq_skew %>% arrange(TIMECODE_RANGE, buoyid, skew_temp)
    # Source:     lazy query [?? x 4]
    # Database:   [Teradata 16.20.50.01] [Teradata Native Driver 17.0.0.2]
    #   [TDAPUSER@<hostname>/TDAPUSERDB]
    # Ordered by: TIMECODE_RANGE, buoyid, skew_temp
      TIMECODE_RANGE                                    `GROUP BY TIME(MINUTES(~ buoyid skew_temp
      <chr>                                             <int64>                   <int>     <dbl>
    1 2014-01-06 08:00:00.000000+00:00,2014-01-06 08:3~ 35345                         0  0.000324
    2 2014-01-06 09:00:00.000000+00:00,2014-01-06 09:3~ 35347                         1  0      
    3 2014-01-06 10:00:00.000000+00:00,2014-01-06 10:3~ 35349                        44 -0.127  
    4 2014-01-06 10:30:00.000000+00:00,2014-01-06 11:0~ 35350                        22 NA      
    5 2014-01-06 10:30:00.000000+00:00,2014-01-06 11:0~ 35350                        44 NA      
    6 2014-01-06 21:00:00.000000+00:00,2014-01-06 21:3~ 35371                         2  0      

Example 2: Calculate the skewness of the 'temperature' column of non-PTI table

  • Calculate the skewness.
    > df_nonpti_skew <- df_nonpti_grp %>% group_by_time(timebucket.duration = "10m", timecode.column = "TIMECODE") %>% summarise(skew_temp = ts.skew(temperature))
  • Print the results.
    > df_nonpti_skew %>% arrange(TIMECODE_RANGE, skew_temp)
    # Source:     lazy query [?? x 3]
    # Database:   [Teradata 16.20.50.01] [Teradata Native Driver 17.0.0.2]
    #   [TDAPUSER@<hostname>/TDAPUSERDB]
    # Ordered by: TIMECODE_RANGE, skew_temp
      TIMECODE_RANGE                                          `GROUP BY TIME(MINUTES(1~ skew_temp
      <chr>                                                   <int64>                       <dbl>
    1 2014-01-06 08:00:00.000000+00:00,2014-01-06 08:10:00.0~ 2314993                      NA   
    2 2014-01-06 08:10:00.000000+00:00,2014-01-06 08:20:00.0~ 2314994                      NA   
    3 2014-01-06 09:00:00.000000+00:00,2014-01-06 09:10:00.0~ 2314999                       0   
    4 2014-01-06 10:00:00.000000+00:00,2014-01-06 10:10:00.0~ 2315005                      -0.384
    5 2014-01-06 10:10:00.000000+00:00,2014-01-06 10:20:00.0~ 2315006                      NA   
    6 2014-01-06 10:30:00.000000+00:00,2014-01-06 10:40:00.0~ 2315008                      NA   
    7 2014-01-06 10:50:00.000000+00:00,2014-01-06 11:00:00.0~ 2315010                      NA   
    8 2014-01-06 21:00:00.000000+00:00,2014-01-06 21:10:00.0~ 2315071                       0