7.00.02.01 - Performing the Analysis - Aster R

Teradata Aster® R User GuideUpdate 3

prodname
Aster R
vrm_release
7.00.02.01
created_date
December 2017
category
Programming Reference
User Guide
featnum
B700-1033-700K
Users create a custom function to perform the analysis. The ExtractOutliers function separates the three identifier columns ('Product', 'Year', and 'Month') and then applies the function lof to any other columns present in the input table.
  1. Create a virtual data frame.
    tadf <- ta.data.frame("products_input")
    
  2. Create the ExtractOutliers function.
    # Using MARGIN argument
    ExtractOutliers <- function(data) {
      library(Rlof)
      data <-data[, c("sales","profit")]
      outliers  <- apply(data, MARGIN = 2, FUN = lof, k = 5)
      return (outliers)
    }
    
  3. Use the R function aa.apply with the ExtractOutliers function.
    outliers_ex1 <- aa.apply(tadf,  FUN = ExtractOutliers, 
                             partition.column = "product", 
                             out.format=list(columns=c("product","year","month",
                                                       "sales", "sales_outlier", "profit","profit_outlier")))
    

    An excerpt of the output is shown here.

    > outliers_ex1
    
    
        product year month sales sales_outlier profit profit_outlier
    1         2 2014     1   166     1.2681085     85      0.9705555
    2         2 2014     2   211     0.9760954     30      1.2148717
    3         2 2014     3   192     1.0154636      4      0.9820451
    4         2 2014     4   136     1.0491496    143      1.2730188
    5         2 2014     5   246     1.3086677     35      1.0929000
    6         2 2014     6   151     1.0067277     45      0.9301366
    7         2 2014     7   128     0.9979419     48      1.0133217
    8         2 2014     8   142     1.1050097     77      1.3678927
    9         2 2014     9   196     0.9718389     94      1.1562808
    
  4. Instead of using the R function aa.apply, write another function ExtractOutliers2 using a for-loop.
    # Using for loop as alternative to MARGIN argument
    ExtractOutliers2 <- function(data) {
      library(Rlof)
      data <-data[, c("sales","profit")]
      outliers <- data.frame()
      for ( i in names(data)) {
        if (ncol(outliers) == 0)
          outliers <-data.frame(lof(data[i], k=5))
        else
          outliers[,ncol(outliers)+1] <- lof(data[i], k=5)
        
      }
      return (outliers)
    }
    
    This function produces exactly the same output as in Step 3.
  5. Save the results as a table in the database.
    ta.dropTable("outliers_output")
    outliers_ex1 <- aa.apply(tadf,  FUN = ExtractOutliers,
                             partition.column = "product",
                             out.format=list(columns=c("product","year","month","sales","sales_outlier","profit","profit_outlier"),
                             columnTypes=c("integer","integer","integer","integer","numeric","integer","numeric"),
                             tableType = "fact", table = "outliers_output",
                             partitionKey="product"))