Users create a custom function to perform the analysis. The ExtractOutliers function separates the three identifier columns ('Product', 'Year', and 'Month') and then applies the function lof to any other columns present in the input table.
-
Create a virtual data frame.
tadf <- ta.data.frame("products_input")
-
Create the ExtractOutliers function.
# Using MARGIN argument ExtractOutliers <- function(data) { library(Rlof) data <-data[, c("sales","profit")] outliers <- apply(data, MARGIN = 2, FUN = lof, k = 5) return (outliers) }
-
Use the R function aa.apply with the ExtractOutliers function.
outliers_ex1 <- aa.apply(tadf, FUN = ExtractOutliers, partition.column = "product", out.format=list(columns=c("product","year","month", "sales", "sales_outlier", "profit","profit_outlier")))
An excerpt of the output is shown here.
> outliers_ex1 product year month sales sales_outlier profit profit_outlier 1 2 2014 1 166 1.2681085 85 0.9705555 2 2 2014 2 211 0.9760954 30 1.2148717 3 2 2014 3 192 1.0154636 4 0.9820451 4 2 2014 4 136 1.0491496 143 1.2730188 5 2 2014 5 246 1.3086677 35 1.0929000 6 2 2014 6 151 1.0067277 45 0.9301366 7 2 2014 7 128 0.9979419 48 1.0133217 8 2 2014 8 142 1.1050097 77 1.3678927 9 2 2014 9 196 0.9718389 94 1.1562808
-
Instead of using the R function aa.apply, write another function ExtractOutliers2 using a for-loop.
# Using for loop as alternative to MARGIN argument ExtractOutliers2 <- function(data) { library(Rlof) data <-data[, c("sales","profit")] outliers <- data.frame() for ( i in names(data)) { if (ncol(outliers) == 0) outliers <-data.frame(lof(data[i], k=5)) else outliers[,ncol(outliers)+1] <- lof(data[i], k=5) } return (outliers) }
This function produces exactly the same output as in Step 3. -
Save the results as a table in the database.
ta.dropTable("outliers_output") outliers_ex1 <- aa.apply(tadf, FUN = ExtractOutliers, partition.column = "product", out.format=list(columns=c("product","year","month","sales","sales_outlier","profit","profit_outlier"), columnTypes=c("integer","integer","integer","integer","numeric","integer","numeric"), tableType = "fact", table = "outliers_output", partitionKey="product"))