ISSS608 2016-17 T3 Group8 Arules Project Proposal

From Visual Analytics and Applications
Revision as of 00:34, 7 August 2017 by Yfguan.2016 (talk | contribs)
Jump to navigation Jump to search

Group eight Logo.png A Visual Application for Better Business Decision Making

Introduction

Project Proposal

Final Report

Application

 


Abstract

Association rule mining is a rule-based machine learning method which is meant to detect frequent patterns, correlations, associations, or causal structures from data sets found in various kinds of databases such as relational databases, transactional databases, and other forms of data repositories. arules is a robust association rule mining package of R. The richness of its functions is comparable to, if not more superior than the expensive commercial-of-the-shelves analytical toolkit such as SAS Enterprise Miner and IBM-SPSS Modeler. However, the usage of arules package tends to be confined within academic research. This is because the effective used of arules package required intermediate R programming skill which is not commonly available in the business analyst community.

In view of this limitation, our project seeks to provide an user-interface to arules package by using R Shiny framework. The user-friendly interface design allows casual users to manage, explore, calibrate and visualise complex items mining and association rules mining models without having to type a single line of code. Besides providing user-friendly interface, our application also incorporates an interactive graph visualisation method to enhance the interpretability of the outputs of frequent itemsets mining and association rules mining algorithms.

This presentation consists of five main sections. Firstly, the motivation and objectives of the project will be discussed. This is followed by a detailed discussion on the principles and concepts of association rule mining and the R packages used to perform association rules mining, the arules family of packages. Thirdly, the application and visualization design with respect to the improvements made to the arules visualization packages will be discussed. Following which, we will demonstrate the flexible use of our application with two different use cases. The presentation will conclude with a sharing of valuable insights gained through working on the project and potential application areas of our application.

Packages Used

There are two core packages used in our application, both of which are under the “arules” family.

arules:

“Arules” is the very foundation on which we built this application. “Arules” enables users to apply association rule mining algorithms on transaction data or any other data that meets certain requirement. It is quite powerful at manipulating and transforming data, pruning redundant rules, as well as filtering association rules generated. Users can filter the rules by customizing thresholds for support, confidence, and lift, as well as the antecedent and consequent, and sort the rules by support, lift and confident.

arulesviz:

“Arulesviz” is a R package that provides users various visualizations of association rules. Users can choose to visualize their association rules using scatter plot, matrix-based visualization, grouped matrix-based visualization, graph-based visualization, parallel coordinates plot, double-decker plot etc. The diversity of visualizations provided makes it the most popular R package for visualizing association rules. Yet one drawback of this package is that these visualizations are all static graphs, which lacks interactivity with users.

Motivation

Association Rule Mining is Powerful
Although association rule mining is usually applied in market basket analysis to mine the relationship between different products, it is actually a very powerful algorithm that can be applied for any dataset to discover the association, correlation and causation between variables. With an interesting target variable, we actually can find out the relevant association rules within the dataset, even if it is not a transaction data.


However, not all datasets are ready to do this kind of data mining in R, for example continuous variables are not easy to handle, which is a barrier for users to step into the world of data analytics. We intend to develop an association rule mining application for non-statistician users to understand easily, to play with and get insights, and to bring them into data analytics using the fundamental yet powerful concept - probabilities.

Room for Improvement of Current Packages

Current R packages available for association rule mining are helpful for data analysts, but their limitations also brings difficulty to interpret the analysis results:

1) Static Visualizations
Visualizations provided in current ARM packages are mostly static, which are difficult for users to do see difference between different settings on the rules without editing a line of code. For example, it is hard to see how many rules were kicked-out if support threshold was increased by 0.005.

2) Lack Interactivity
There is little interactivity in the current ARM packages. Users are unable to select or zoom in on the visualizations.

3) Manual Calibration
Users are only able to manually calibrate the generated rules by changing thresholds for support, confidence, lift, antecedent or consequent.

Our Goal

To build a generic application for association rule mining that:

1. Allows use of different datasets
We want to make it possible for any dataset to do association rule mining, not limited to transaction data anymore. We would like to break our application into two usages, one for generic market basket analysis, where users shall load transaction data and find the association rules within the items, the other for targeted association rule mining, where users can upload any kind of dataset with a target variable they are interested in so that they could find causation between this variable and other variables. This kind of exploratory analysis shall bring them insights and lay a solid foundation for them to do any further predictive analysis on the same dataset.

2. Improves functions of Current R packages
As is mentioned above, there are several limitations of current R packages for association rule mining. By building this R shiny app, we would try to exceed these limitations and bring the application and visualization of association rule mining to a new level. We would be specifically improving on:
1)Stats explorer
2)Network diagram
3)Interactivity of visualization

Our goal is that with this app, business users can find it interesting and useful to do association rule mining on their dataset, and they can get a clearer interpretation of the analysis results so that they would be able to gain more insights from it.