Difference between revisions of "ISSS608 2016-17 T3 Group8 Arules Final Projrct"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 1: Line 1:
<div style=background:#881332 border:#A3BFB1>
 
[[Image:Group eight Logo.png|200px]]
 
<b><font size = 6; color="#FFFAF0">A Visual Application for Better Business Decision Making </font></b>
 
</div>
 
<!--MAIN HEADER -->
 
{|style="background-color:#ffefd8;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
 
| style="font-family:Century Gothic; font-size:100%; solid #000000; background:#000000; text-align:center;" width="25%" |
 
;
 
[[ISSS608_2016-17_T3_Group8_Arules_Introduction|<b><font size="2"><font color="#FFFAF0">Introduction</font></font></b>]]
 
 
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#000000; text-align:center;" width="25%" |
 
;
 
[[ISSS608_2016-17_T3_Group8_Arules_Project Proposal|<b><font size="2"><font color="#FFFAF0">Project Proposal</font></font></b>]]
 
 
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#000000; text-align:center;" width="25%" |
 
;
 
[[ISSS608_2016-17_T3_Group8_Arules_Final Projrct|<b><font size="4"><font color="#FFFAF0">Final Report</font></font></b>]]
 
 
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#000000; text-align:center;" width="25%" |
 
;
 
[[ISSS608_2016-17_T3_Group8_Arules_Application| <b><font size="2"><font color="#FFFAF0">Application</font></font></b>]]
 
 
|  &nbsp;
 
|}
 
<br/>
 
 
<div style=background:#881332 border:#A3BFB1>
 
<div style=background:#881332 border:#A3BFB1>
 
[[Image:Group eight Logo.png|200px]]  
 
[[Image:Group eight Logo.png|200px]]  

Revision as of 23:31, 6 August 2017

Group eight Logo.png A Visual Application for Better Business Decision Making

Introduction

Project Proposal

Final Report

Application

 


Background on Association Rules

What is association rule mining?

  • An association rule is a pattern that when an event occurs, another event occurs with a certain probability.
  • Association rules are if/then statement that help undercover relationships between unrelated items. i.e. to find the relationships between the objects which are frequently bought together
  • Association Rules find all sets of items (items set) that have support greater than the minimum support. Then using the large items to generate the desired rules that have confidence greater than the minimum confidence
  • A typical and widely used example of association rules application is market basket analysis.It has an LHS and an RHS part and can be represented as itemset X => itemset Y. This means that the item/s on the right were frequently purchased along with items on the left.


Key indicators of Association Rules:

Description Illustration
Support
  • Popularity of {X ,Y}
  • Portion of transactions an itemset appears
  • Eg {apple, beer, rice}
  • Appearance: 2 out of 8
  • Support: 25%
VRshiny report grp8 files image001.png
Confidence
  • Likelihood of {X -> Y}
  • Portion of transactions with X , in which Y also appears
  • Eg {apple -> beer}
  • 3 out of 4 transactions of apple contains beer
  • Confidence: 75%
VRshiny report grp8 files image003.png
Lift
  • Usefulness of {X -> Y}
  • How more often Y appears because of X
  • A lift of 2 on the {apple beer} rule means that buying apple will increase the chance of buying beer by 2
VRshiny report grp8 files image005.png


Generic VS Targeted Association Rule Mining
Although association rule mining is more commonly used for the market basket analysis, it can be used for other contexts as well. So instead of generically study the rules, we can also define a target of study, and use ARM to find out what combination of factors are more likely to lead to the occurrence of our target variable.In this case, the target of interest is always kept at RHS, as the consequent. A good example of the targeted ARM is demonstrated with the titanic data where people used association rule mining to see which group of passengers are likely to survive from titanic – the combination of passenger attributes shows that women and children are the ones survived. [File:VRshiny_report_grp8_files_image007.png|500px|center]

Choice of Visualizations and Critics

This section discusses the choices of visualizations used in our application with respect to their usefulness. Critics on the default visualizations provided in the arulesviz packages will be discussed as well to the areas for improvement for our visualization designs.

Discussion Visualization
Benefits of scatterplots
  • Good for multivariate comparison of support & confidence, colour by lift
  • Good for stats explorer for general MBA – The fact that all the itemsets in the transactions are important to the user, it is good to have an overview of the stats first before the users decide on which rules they wish to further investigate or action upon.


The scatterplot on the left is generated by the arulesviz package, the limitations are:

  • Too clustered, overlapped dots
  • Loss the information of associations
  • Manual calibration of 3 interestingness statistics
VRshiny report grp8 files image009.png
Benefits of network diagrams
  • Clearly shows the casual relationship between the LHS items and RHS items (From & to)
  • Differentiation of rules & itemsets as both rules and items are represented separately
  • Interactions of rules,itemsets & 3 stats – allows us to visualize which rules are more important than others and which items are more popular/unpopular


The network graph on the left is generated by the arulesviz package, the limitations are:

  • Confusing
  • Less room for user interaction
  • Loss the information on 3 stats (only can see one, not three together)
  • Manual configuration
VRshiny report grp8 files image011.png

Application Design at a Glance

Design Concepts Dashboard
1.Load data dashboard

This dashboard allows the user to load their data for using our app
It has to be highlighted that only categorical variables are accepted by arules package for doing association rule mining. We thus provided a functionality to transform the numeric variables and binary numeric variables to categorical variables first at the data importing stage.


The screen displays the original data uploaded and the transformed data (if it contains numeric variables)


The transformation process will be discussed in the next section on the detailed application designs.

VRshiny report grp8 files image013.png
2.Dashboard for generic market basket analysis

This dashboard is designed for the generic MBA analysis, where we allow the users to have an overview of the individual rules before they choose one particular area to investigate.


The zoom-in scatter plot, the network visualization and the data table will be filtered based on the selected box in the first overview scatterplot.

VRshiny report grp8 files image015.png
3.Dashboard for Targeted ARM

This dashboard is designed for the targeted ARM analysis. Since the targeted ARM already has an targeted item of interest, we skipped the stats explorer part but added in more interactive features for the users to calibrate the model and investigate the items of their interest.

For example, the user can choose to view only the association network of the rules leading to a specific target variable.


A datatable indicating the three interestingness measures of each rule is included at the bottom of the network graph visualization.

VRshiny report grp8 files image017.png

Application Design in Details

1. Load data

1)Choose File to Upload
VRshiny report grp8 files image019.png

Users can upload any dataset they want as long as they are in the following format:

1.Market Basket Analysis

  • Single Format: Col1 = Transaction ID, Col2 = Item Name(single)
  • Basket Format: Col1 = Transaction ID, cOL2 = Item Names(Multiple)


2.Targeted ARM Any dataset that contains a target variable that the user is interested in. The users are able to choose if the dataset contains a header and the separator for the file.

2.Import Data
VRshiny report grp8 files image021.png
Once the data file is uploaded and the users are ready to do the association rule mining on this dataset, they can click the “Import data” button and then the data would be imported into our server and saved as a data frame, which would be used for all the following data transformation, analyses and visualizations.


This is done by using “eventReactive” function in R shiny. To save a data frame dependent on user’s uploaded file, we make a reactive data frame that would be stored only when some event happens, which, in this case, is clicking on the “Import data” button.

3.Variable Transformation
  • Check binary:


binary_check=apply(HRdata,2,function(x) { all(na.omit(x) %in% 0:1) })

  • Transform binary:


change_to_logical=as.data.frame(lapply(HRdata[binary_check],function(x) as.factor(as.logical(x))))

  • Check numeric:

numeric_check=sapply(df1,is.numeric)

Transform numeric:
change_to_factor=as.data.frame(lapply(df1[numeric_check],function(x) factor(ntile(x, 3),levels = c(1,2,3),labels = c("low","mid","high"))))

a) Transforming binary column Columns containing numbers of only 1 or 0 are considered as binary columns. (NA is allowed). Binary variables are recoded to “True” or “False”

b) From the remaining columns, transform numeric columns to 3 bins “low, mid, high” based on quantiles.

  • This step can be improved by allowing the user to choose the number of bins and the naming conventions for each bin.

c) Categorical columns: unchanged Users were educated in the user guide to not using numeric numbers to represent categorical information, otherwise they will be transformed in step 2.

d) Combing the transformed columns and the original categorical columns back to form the new dataframe for association rule minng.