Performing the Analysis - Aster R

Teradata Aster® R User GuideUpdate 3

Product
Aster R
Release Number
7.00.02.01
Published
December 2017
Language
English (United States)
Last Update
2018-04-13
dita:mapPath
fop1497542774450.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
fbp1477004286096
lifecycle
previous
Product Category
Software
Users create a custom function to perform the analysis. The ExtractOutliers function separates the three identifier columns ('Product', 'Year', and 'Month') and then applies the function lof to any other columns present in the input table.
  1. Create a virtual data frame.
    tadf <- ta.data.frame("products_input")
    
  2. Create the ExtractOutliers function.
    # Using MARGIN argument
    ExtractOutliers <- function(data) {
      library(Rlof)
      data <-data[, c("sales","profit")]
      outliers  <- apply(data, MARGIN = 2, FUN = lof, k = 5)
      return (outliers)
    }
    
  3. Use the R function aa.apply with the ExtractOutliers function.
    outliers_ex1 <- aa.apply(tadf,  FUN = ExtractOutliers, 
                             partition.column = "product", 
                             out.format=list(columns=c("product","year","month",
                                                       "sales", "sales_outlier", "profit","profit_outlier")))
    

    An excerpt of the output is shown here.

    > outliers_ex1
    
    
        product year month sales sales_outlier profit profit_outlier
    1         2 2014     1   166     1.2681085     85      0.9705555
    2         2 2014     2   211     0.9760954     30      1.2148717
    3         2 2014     3   192     1.0154636      4      0.9820451
    4         2 2014     4   136     1.0491496    143      1.2730188
    5         2 2014     5   246     1.3086677     35      1.0929000
    6         2 2014     6   151     1.0067277     45      0.9301366
    7         2 2014     7   128     0.9979419     48      1.0133217
    8         2 2014     8   142     1.1050097     77      1.3678927
    9         2 2014     9   196     0.9718389     94      1.1562808
    
  4. Instead of using the R function aa.apply, write another function ExtractOutliers2 using a for-loop.
    # Using for loop as alternative to MARGIN argument
    ExtractOutliers2 <- function(data) {
      library(Rlof)
      data <-data[, c("sales","profit")]
      outliers <- data.frame()
      for ( i in names(data)) {
        if (ncol(outliers) == 0)
          outliers <-data.frame(lof(data[i], k=5))
        else
          outliers[,ncol(outliers)+1] <- lof(data[i], k=5)
        
      }
      return (outliers)
    }
    
    This function produces exactly the same output as in Step 3.
  5. Save the results as a table in the database.
    ta.dropTable("outliers_output")
    outliers_ex1 <- aa.apply(tadf,  FUN = ExtractOutliers,
                             partition.column = "product",
                             out.format=list(columns=c("product","year","month","sales","sales_outlier","profit","profit_outlier"),
                             columnTypes=c("integer","integer","integer","integer","numeric","integer","numeric"),
                             tableType = "fact", table = "outliers_output",
                             partitionKey="product"))