Group06 Report

Group 6

Report

1 Introduction

An e-commerce company in the United Kingdom has provided their transaction data from 1 December 2010 to 9 December 2011. This company mainly has wholesalers as its customers. Based on the data provided, the company’s client portfolio is mainly based in the United Kingdom. Within the data set, the key variables given are: (i) invoice number; (ii) stock code; (iii) description; (iv) quantity; (v) invoice date; (vi) unit price; (vii) customer identification number; and (viii) country. Being a primarily sales driven company, the key objective should be to maximize the revenue of the company. To achieve this, we will adopt a four-pronged approach: (a) understand the seasonality of the goods flow through the period; (b) identify cross-sell opportunities through customer similarities; (c) clustering of high value customers to target; and (d) reviewing product descriptions to understand the product descriptions that sell well.

2 Motivation and Objective

On the market, there are various customer intelligence platforms available: “DataSift”, “SAS Customer Intelligence”, “Accenture Insights Platform”, etc. However, none of them are offer an integrated bespoke solution for our data on hand. Our motivation is to build an entirely bespoke application that would allow the company to fully analyse their data right at the onset.

Data Source and Preparation

We will use an e-commerce dataset from the UCI Machine Learning Repository that contains transactional data between 2010 and 2011. The company mainly sells unique all-occasion gifts and most of their customers are wholesalers. Below is a data dictionary of the available fields within the dataset:

Field	Description
InvoiceNo	Invoice number. Nominal, a 6-digit integral number uniquely assigned to each transaction. If this code starts with letter 'c', it indicates a cancellation.
StockCode	Product (item) code. Nominal, a 5-digit integral number uniquely assigned to each distinct product.
Description	Product (item) name. Nominal.
Quantity	The quantities of each product (item) per transaction. Numeric.
InvoiceDate	Invoice Date and time. Numeric, the day and time when each transaction was generated.
UnitPrice	Unit price. Numeric, Product price per unit in sterling.
CustomerID	Customer number. Nominal, a 5-digit integral number uniquely assigned to each customer.
Country	Country name. Nominal, the name of the country where each customer resides.

Approach

The application will be built primarily based on the R Shiny framework. The key focus areas will be:

Visualization of bipartite networks (i.e. between customers and products)
Visualization of clustering/segmentation of customers (RFM Model)
Visualization of popular products through text analytics

Some examples of visualizations are as follows:

Visualization of bipartite networks

Visualization of clustering/segmentation of customers

We intend to create the feature to allow users can visualize clustering results interactively. It helps to give more meaningful labels to different clusters. And eventually it brings insightful understanding and segmentation to the customers.

Visualization of products popularity

It is very important for retailers to understand customer preferences for different products, Natural Language Processing will be conducted to explore the popularity of products. By understanding what are the most and least popular products and the properties of the popular products, the retail will be able to make smarter decisions towards the customer’s preference, thus improve the revenues. Different visualization techniques such as bar chart, word cloud and etc will be used to ease the understanding of the insights discovered.

Selection of Tools

Based on our preliminary assessment, we will utilize the following libraries for the development of the R Shiny dashboard: tidyr, dplyr, ggplot2, igraph, htmlwidgets, networkD3, mclust, shiny, shinyTime, shinydashboard, shinythemes, sjmisc, readxl, stringr, data.table, dummies, sjPlot, car, DT, reshape2, sqldf, igraph, etc.

Group06 Report

Contents

1 Introduction

2 Motivation and Objective

Data Source and Preparation

Approach

Visualization of bipartite networks

Visualization of clustering/segmentation of customers

Visualization of products popularity

Selection of Tools

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools