ISSS608 2016-17 T3 Group8 Arules Project Proposal

From Visual Analytics and Applications
Revision as of 00:19, 7 August 2017 by Yuhui.zhou.2016 (talk | contribs)
Jump to navigation Jump to search

Group eight Logo.png A Visual Application for Better Business Decision Making

Introduction

Project Proposal

Final Report

Application

 


Current Packages

There are two core packages used in our application, both of which are under the “arules” family.

arules:

“Arules” is the very foundation on which we built this application. “Arules” enables users to apply association rule mining algorithms on transaction data or any other data that meets certain requirement. It is quite powerful at manipulating and transforming data, pruning redundant rules, as well as filtering association rules generated. Users can filter the rules by customizing thresholds for support, confidence, and lift, as well as the antecedent and consequent, and sort the rules by support, lift and confident.

arulesviz:

“Arulesviz” is a R package that provides users various visualizations of association rules. Users can choose to visualize their association rules using scatter plot, matrix-based visualization, grouped matrix-based visualization, graph-based visualization, parallel coordinates plot, double-decker plot etc. The diversity of visualizations provided makes it the most popular R package for visualizing association rules. Yet one drawback of this package is that these visualizations are all static graphs, which lacks interactivity with users.

Motivation

Association Rule Mining is Powerful
Although association rule mining is usually applied in market basket analysis to mine the relationship between different products, it is actually a very powerful algorithm that can be applied for any dataset to discover the association, correlation and causation between variables. With an interesting target variable, we actually can find out the relevant association rules within the dataset, even if it is not a transaction data.


However, not all datasets are ready to do this kind of data mining in R, for example continuous variables are not easy to handle, which is a barrier for users to step into the world of data analytics. We intend to develop an association rule mining application for non-statistician users to understand easily, to play with and get insights, and to bring them into data analytics using the fundamental yet powerful concept - probabilities.

Room for Improvement of Current Packages

Current R packages available for association rule mining are helpful for data analysts, but their limitations also brings difficulty to interpret the analysis results:

1) Static Visualizations
Visualizations provided in current ARM packages are mostly static, which are difficult for users to do see difference between different settings on the rules without editing a line of code. For example, it is hard to see how many rules were kicked-out if support threshold was increased by 0.005.

2) Lack Interactivity
There is little interactivity in the current ARM packages. Users are unable to select or zoom in on the visualizations.

3) Manual Calibration
Users are only able to manually calibrate the generated rules by changing thresholds for support, confidence, lift, antecedent or consequent.

Our Goal

To build a generic application for association rule mining that:

1. Allows use of different datasets We want to make it possible for any dataset to do association rule mining, not limited to transaction data anymore. We would like to break our application into two usages, one for generic market basket analysis, where users shall load transaction data and find the association rules within the items, the other for targeted association rule mining, where users can upload any kind of dataset with a target variable they are interested in so that they could find causation between this variable and other variables. This kind of exploratory analysis shall bring them insights and lay a solid foundation for them to do any further predictive analysis on the same dataset.

Incorporate: R shiny app for interactive association rules mining Improve Improved stats explorer Improved network diagram