This example uses the Aster cFilter function to examine a dataset of grocery store transactions to identify items that are often bought together. This example also shows how R graphic functions can be used with the output of Aster R functions.
The input data is shown here.
trans_id | date | store_id | region | item | sku | category |
---|---|---|---|---|---|---|
1 | 20100715 | 1 | west | milk | 1 | dairy |
1 | 20100715 | 1 | west | butter | 2 | dairy |
1 | 20100715 | 1 | west | eggs | 3 | dairy |
1 | 19990715 | 1 | west | flour | 4 | baking |
2 | 20100715 | 1 | west | milk | 1 | dairy |
2 | 20100715 | 1 | west | butter | 2 | dairy |
2 | 20100715 | 1 | west | eggs | 3 | dairy |
3 | 20100715 | 1 | west | milk | 1 | dairy |
3 | 20100715 | 1 | west | eggs | 3 | dairy |
3 | 19990715 | 1 | west | flour | 4 | baking |
4 | 20100715 | 1 | west | milk | 1 | dairy |
4 | 20100715 | 1 | west | butter | 2 | dairy |
5 | 20100715 | 2 | west | butter | 2 | dairy |
5 | 20100715 | 2 | west | eggs | 3 | dairy |
5 | 19990715 | 2 | west | flour | 4 | baking |
6 | 20100715 | 2 | west | milk | 1 | dairy |
6 | 20100715 | 2 | west | eggs | 3 | dairy |
7 | 20100715 | 2 | west | eggs | 3 | dairy |
7 | 19990715 | 2 | west | flour | 4 | baking |
8 | 20100715 | 3 | west | butter | 2 | dairy |
8 | 20100715 | 3 | west | eggs | 3 | dairy |
8 | 19990715 | 3 | west | flour | 4 | baking |
-
Convert the data to a virtual data frame.
ta.dropTable("shopping_tbl", schemaName = "public") shopping.tadf <- ta.create(shopping, table = "shopping_tbl", schemaName = "public", tableType = "fact", partitionKey="region", row.names=TRUE, colTypes = c( trans_id='int', date='date', store_id='int',region='text', item='text', sku='int', category='text') )
-
Call the cFilter function.
cf_out <- aa.cfilter( shopping.tadf, input.columns = "item", join.columns = "trans_id", add.columns = "region" )
The output is shown here.
-
Take the results of interest from the function output, and use the R library circlize to display these results graphically.
output_table <- as.data.frame(cf_out$output.table) library(circlize) chordDiagramFromDataFrame(output_table[,c("col1_item1","col1_item2","score")])
The resulting diagram is shown here.