Difference between revisions of "Group06 Proposal"
Line 19: | Line 19: | ||
<br> | <br> | ||
− | + | Proposal | |
<br> | <br> | ||
− | + | ==Project Motivation== | |
− | + | Google and Temasek Holdings have forecasted that the Singapore’s e-commerce market will grow to S$7.5bn within the next 10 years. With this growth, the amount of data available for analysis in this area will also grow rapidly. This presents unique opportunities to find out how retailers can maximize the value of their data. | |
− | + | ||
− | + | ==Objective== | |
− | + | Our objective for this project is to push the boundaries on market and customer analytics. We want to challenge ourselves as to how transaction data for retailers can be visualized and various machine learning techniques can be adopted to generate insights in this area. | |
+ | |||
+ | ==Data Source and Preparation== | ||
We will use an e-commerce dataset from the UCI Machine Learning Repository that contains transactional data between 2010 and 2011. The company mainly sells unique all-occasion gifts and most of their customers are wholesalers. Below is a data dictionary of the available fields within the dataset: | We will use an e-commerce dataset from the UCI Machine Learning Repository that contains transactional data between 2010 and 2011. The company mainly sells unique all-occasion gifts and most of their customers are wholesalers. Below is a data dictionary of the available fields within the dataset: | ||
* InvoiceNo: Invoice number. Nominal, a 6-digit integral number uniquely assigned to each transaction. If this code starts with letter 'c', it indicates a cancellation. | * InvoiceNo: Invoice number. Nominal, a 6-digit integral number uniquely assigned to each transaction. If this code starts with letter 'c', it indicates a cancellation. | ||
Line 37: | Line 39: | ||
* Country: Country name. Nominal, the name of the country where each customer resides. | * Country: Country name. Nominal, the name of the country where each customer resides. | ||
− | + | ==Approach== | |
The application will be built primarily based on the R Shiny framework. The key focus areas will be: | The application will be built primarily based on the R Shiny framework. The key focus areas will be: | ||
* Visualization of bipartite networks (i.e. between customers and products) | * Visualization of bipartite networks (i.e. between customers and products) | ||
Line 44: | Line 46: | ||
Some examples of visualizations are as follows: | Some examples of visualizations are as follows: | ||
− | Force-Directed Graphs for Bipartite Networks | + | ===Force-Directed Graphs for Bipartite Networks=== |
[Image] | [Image] | ||
− | + | ==Selection of Tools== | |
Based on our preliminary assessment, we will utilize the following libraries for the development of the R Shiny dashboard: tidyr, dplyr, ggplot2, igraph, htmlwidgets, networkD3, mclust, shiny, shinyTime, shinydashboard, shinythemes, sjmisc, readxl, stringr, data.table, dummies, sjPlot, car, DT, reshape2, sqldf, igraph, etc. | Based on our preliminary assessment, we will utilize the following libraries for the development of the R Shiny dashboard: tidyr, dplyr, ggplot2, igraph, htmlwidgets, networkD3, mclust, shiny, shinyTime, shinydashboard, shinythemes, sjmisc, readxl, stringr, data.table, dummies, sjPlot, car, DT, reshape2, sqldf, igraph, etc. | ||
− | + | ==References== | |
Webber, R. (2013). The evolution of direct, data and digital marketing. Journal of Direct, Data and Digital Marketing Practice, 14(4), 291–309. https://doi.org/10.1057/dddmp.2013.20<br> | Webber, R. (2013). The evolution of direct, data and digital marketing. Journal of Direct, Data and Digital Marketing Practice, 14(4), 291–309. https://doi.org/10.1057/dddmp.2013.20<br> | ||
You, Z., Si, Y.-W., Zhang, D., Zeng, X., Leung, S. C. H., & Li, T. (2015). A decision-making framework for precision marketing. Expert Systems with Applications, 42(7), 3357–3367. https://doi.org/10.1016/j.eswa.2014.12.022<br> | You, Z., Si, Y.-W., Zhang, D., Zeng, X., Leung, S. C. H., & Li, T. (2015). A decision-making framework for precision marketing. Expert Systems with Applications, 42(7), 3357–3367. https://doi.org/10.1016/j.eswa.2014.12.022<br> |
Revision as of 21:17, 13 June 2018
Group 6
Overview | Proposal | Poster | Application | Report |
Proposal
Contents
Project Motivation
Google and Temasek Holdings have forecasted that the Singapore’s e-commerce market will grow to S$7.5bn within the next 10 years. With this growth, the amount of data available for analysis in this area will also grow rapidly. This presents unique opportunities to find out how retailers can maximize the value of their data.
Objective
Our objective for this project is to push the boundaries on market and customer analytics. We want to challenge ourselves as to how transaction data for retailers can be visualized and various machine learning techniques can be adopted to generate insights in this area.
Data Source and Preparation
We will use an e-commerce dataset from the UCI Machine Learning Repository that contains transactional data between 2010 and 2011. The company mainly sells unique all-occasion gifts and most of their customers are wholesalers. Below is a data dictionary of the available fields within the dataset:
- InvoiceNo: Invoice number. Nominal, a 6-digit integral number uniquely assigned to each transaction. If this code starts with letter 'c', it indicates a cancellation.
- StockCode: Product (item) code. Nominal, a 5-digit integral number uniquely assigned to each distinct product.
- Description: Product (item) name. Nominal.
- Quantity: The quantities of each product (item) per transaction. Numeric.
- InvoiceDate: Invice Date and time. Numeric, the day and time when each transaction was generated.
- UnitPrice: Unit price. Numeric, Product price per unit in sterling.
- CustomerID: Customer number. Nominal, a 5-digit integral number uniquely assigned to each customer.
- Country: Country name. Nominal, the name of the country where each customer resides.
Approach
The application will be built primarily based on the R Shiny framework. The key focus areas will be:
- Visualization of bipartite networks (i.e. between customers and products)
- Visualization of clustering/segmentation of customers
- Visualization of popular products through text analytics
Some examples of visualizations are as follows:
Force-Directed Graphs for Bipartite Networks
[Image]
Selection of Tools
Based on our preliminary assessment, we will utilize the following libraries for the development of the R Shiny dashboard: tidyr, dplyr, ggplot2, igraph, htmlwidgets, networkD3, mclust, shiny, shinyTime, shinydashboard, shinythemes, sjmisc, readxl, stringr, data.table, dummies, sjPlot, car, DT, reshape2, sqldf, igraph, etc.
References
Webber, R. (2013). The evolution of direct, data and digital marketing. Journal of Direct, Data and Digital Marketing Practice, 14(4), 291–309. https://doi.org/10.1057/dddmp.2013.20
You, Z., Si, Y.-W., Zhang, D., Zeng, X., Leung, S. C. H., & Li, T. (2015). A decision-making framework for precision marketing. Expert Systems with Applications, 42(7), 3357–3367. https://doi.org/10.1016/j.eswa.2014.12.022