Group01 proposal

From ISSS608-Visual Analytics and Applications
Revision as of 09:14, 4 March 2020 by Sfxu.2019 (talk | contribs) (Created page with "frameless|center| <div> {|style="background-color:#CEECF2;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" | | style="font-family...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Proposal

Poster

Application

Research Paper


Abstract

Market Basket Analysis (MBA) is one of the key techniques used by large retailers to uncover associations between products brought by consumers. In our case, we apply a sequential version of MBA, called “sequential itemset mining” or “sequential pattern mining”, to analyze whether buying one item in the past indicates a higher likelihood of buying other things in the future. For instance, whether purchasing peanut butter implies sales of bread in the near future. 

Data Source

This data from the ' The Instacart Online Grocery Shopping Dataset 2017 '. [1]

About Data Set

There are overall 3421083 number of orders of 206209 customers over 49688 products in 21 major catagories.

Background of Association Rule Mining

What is Association Rule Mining?

Association Rule Mining is used when you want to find an association between different objects in a set, find frequent patterns in a transaction database, relational databases or any other information repository.


What is Association Rules?

Association Rule Mining can tell you what items do customers frequently buy together by generating a set of rules called Association Rules in form if this then that.

How do we apply Association Rule Mining?

The applications of Association Rule Mining are found in Marketing, Basket Data Analysis (or Market Basket Analysis) in retailing, clustering and classification. In our project, we apply it particularly by Market Basket Analysis.

Principal indicators of Association Rules:

Key indicator.jpg


Application UI

Data loaded and preprocessing

Gentric market basket analysis by selecting threshold

Rule network on categories or items of interest

Network.jpg

Time series analysis of selected category or item (line chart or bar chart)

Show the most ranking rules over time & Display rule between two items or categories over time

Barchart.jpg


Objects

1.Display the overview of relationship between category using Network diagram 2.Show product bundles that expect to be consumed simultaneously and key indicators of each 3.Interactively visualize how these bundles evolve over time

Packages Use

We hypothesize that greater economic development increases health risks. There are enough evidences to support that developing of Chinese economy brings great pressure which increases the rate of unhealthy behaviours such as smoking and alcohol abuse, and notably high occurrences of chronic diseases such as hypertension, heart disease, and diabetes. Therefore, nationals have higher health risks.

Model

Linear Probability Model (LPM) is a popular model used in social sciences research and we use it to examine our hypothesis, is there any relationship between Health Risks and Economy Developments in China. LPM is a regression model where the target or dependent variable is a binary variable and the independent variables can be binary and continuous. In our project, we use equation similar to following one to test hypothesis.

Sensitivity Test

After using LPM on our hypothesis, we need to finish robustness to test model stability. We use two ways. First is that we replace our original economy index with the other one and see is there any difference on results. Second, we use household based index rather than individual base index and see the results.

Critics of Existing Works

Our project is an extension of this research published by Bakkeli in 2016.Table below shows a summary about this work.
Bakkeli, N. (2016). Income inequality and health in China: A panel data analysis.Social Science & Medicine, 157, 39–47.

Package Description
arules Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
arulesViz Extends package 'arules' with various visualization techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
tidyverse The tidyverse is an opinionated collection of R packages designed for data science.
readr Read Excel Files in R.
dplyr Tools for Splitting, Applying and Combining Data.
ggplot2 Create graphics and charts.
Visnetwork Create graphics and charts.

Team Members

References