ANLY482 AY2016-17 T2 Group 2 Findings Dashboard Application
Main Findings | Dashboard | Finals |
---|
Exploration of Analytical Tools
Sankey Diagram
There are many factors involved in designing a Sankey diagram. Some of the common factors of consideration are the number of levels and direction of flow.
Cross-sectional analysis is based on two-time period therefore it is ideal to have two levels to illustrate the two-time periods of ‘before’ and ‘after. Furthermore, having more than two levels is suitable only when showing the flow of multiple transitions. The flow of the Sankey diagram will be designed such that it is positioned from left to right as it better illustrates the ‘before’ and ‘after’ time period as compared to a design of flow from top to bottom.
Figure 5 illustrates a shift in customer purchasing behaviour for ticket sales using a Sankey Diagram. The left and right nodes represent two different time periods whereas the nodes represent a variable. A variable can either represent a single event or permutation of multiple events. Sankey diagram uses the width of flow and height of the nodes to depict the shift in certain variables. Shown in the diagram, the thicker the width of flow or longer the nodes illustrate the popularity of the particular variable. The use of Sankey diagram enables illustration of which variables, represented by nodes, are more popular than others between two time periods.
Chord Diagram
As compared to Sankey diagram, chord diagram can handle larger number of variable permutations while still effectively depicting the desired analysis. It is common for chord diagram design to have categorization of variables around the circumference while the flow of behaviour is shown within the circle. The only concern is the large number of permutation for categories may cause the interpretation of results to be complicated. To overcome this challenge, a filter function within the legend was created. Using the filter function, visualisation will be performed only on selected categories. This simplify the behaviour flow within the chord diagram, analysis can be performed on a few categories at a time, rather than all categories.
Figure 6 illustrates a shift in customer purchasing behavior for the types of tickets sold using a chord diagram. As mentioned above, the circumference of the circle represents the permutation of different variables. A variable in a chord diagram can either represent a single event/sales or permutation of multiple events/sales. The width of the flow within the circle illustrate the shift in sales from one variable to another. Popularity of the variables is illustrated by the length that the variable span across the circumference. As seen from the figure, the longer the encompassing length, the more popular is that variable/event.
The Application
Application Workflow
After careful considerations on the tools needed for our analysis, we have decided to use R for its extensive open-source packages. The application utilises a r script that was produced to clean and aggregate the data to form multiple .csv files that will be zipped together in a folder. Users will then upload this .zip folder into the Shiny Dashboard for the graphs to be generated.
Shiny is a free library in R that allows users to create interactive applications for exploring, sharing data and analyses on the web. While Shiny contained some layout functionality for creating enterprise dashboard, it lacked features like boxes for easy configuration of graphs, side panels etc. Shiny Dashboard is a R package that is specially design by the RStudio team to counteract this and allows users to build web-based visualisation with ease. It utilises Shiny, AdminLTE (a free premium admin control panel theme) and Bootstrap 3 (a responsive front-end web development framework) and this allows users to create complex enterprise dashboards without the need to learn HTML or javascript. Configuration is still possible with css and jQuery. The ease of building dashboards using R, Shiny and Shiny Dashboard is even more emphasised in their visualisation of data where any appropriate R package including ggplot2 and plotly that can be used.
The various charts in our dashboard is generated with the help of several libraries including plotly for tabular charts, box plot and slope graph, rChart for Sankey diagrams and bar charts, and EChart for the chord diagrams.
RCHARTS
rChart is a R package used for creating customized javascript visualisation using a lattice style plotting interface directly from R. The beauty of using rCharts is in its support for various charting libraries which makes it highly customizable. We made used of NVD3 javascript charting library for the creation of Sankey diagram and bar chart. NVD3 library is designed to build chart for d3 to create these plots. These is done via a function call using nPlot.
PLOTLY
Plotly is a R package for creating interactive web-based graphs via the open source JavaScript graphing library plotly.js. Plotly.js is built on top of commonly used visualisation packages such as stack.gl and d3.js. It is a high-level library which uses JSON objects. Each aspect of the chart such as lines, axis or even legend will point to a corresponding set of JSON attributes. However, configuring plotly charts is easy as it only requires the data and any required settings through its plotly method. The library will seamlessly convert and handle all required JSON objects required for the chart. Furthermore, Plotly graphs in R are rendered locally through the htmlwidgets framework. This framework handle any Javascript dependencies that is required by the graphing library, plot.js. Plotly can used in R to create various charts such as boxplot, line chart or even sophisticated chart types like SVG chart and statistical graphs.
ECHART
The R package recharts provides an interface to the JavaScript library ECharts for data visualisation. ECharts is a free, powerful charting and visualisation library offering an easy way of adding intuitive, interactive, and highly customizable charts. It is written in pure JavaScript and based on zrender, which is a whole new lightweight canvas library. The recharts package allows for R users to easily create charts with just a few lines of R code and without having to know HTML or JavaScript. The recharts package was built on top of htmlwidgets, which takes cares of managing JavaScript dependencies and dealing with different types of output documents such as R Markdown and Shiny. The main usage of ECharts is to pass a JavaScript object to the .setOption() method, and the package provides a low-level echart.list() method to construct such an object simply by using a list in R. Using the recharts package, users can simply create a chart and the package will handle the underlying processes when the chart is rendered in R Markdown, Shiny, or the R console / RStudio.
User Interface Design
A dashboard is a medium of communication and the overarching objective is to present the most relevant information clearly and effectively. [21] We attempt to follow Stephen Few’s best attributes for information dashboard design by focusing on displaying the most salient variables in the simplest and most concise way. The foremost point in this is to be specific on our objectives as to what we want to achieve in this dashboard design. From this, we looked at the most important variables that we want to present, which is the customer flow data.
The first tab is the customer behavioural shift dashboard, where slope graphs and Sankey diagrams are used to effectively display information. We were careful in the placement of graphs so that users will be able to look at macro-level variables before focusing on the customer flow diagrams. General statistics were crafted on the top-left, followed by slope graphs that showcase the proportionate difference of the three products. With that undersstanding of the performance of each product, users can have an in-depth look to the customer shifts of each product type starting from the Sankey diagram on the bottom-left side to those on the right side.
The in-depth customer variables were then displayed in the second tab where decision makers could look at the purchasing differences of customers from the cross-sectional data. We maintained the use of simple comparison charts here to showcase the differences in customer demographics and transaction variables in the most basic yet best way. For example, box plot is used here to look at variations in purchase amounts across customers. Bar charts are still good ways to look at demographic differences. In cases where variable flows could be detected, Sankey diagrams could still be used to visualise the data.
The next 3 tabs seek to look at variables important to the individual product types. All the product types have a chord diagram to dive into categories of a product and display the flow of customers in purchasing the different categories. This chord diagram enables users to know at a glance, what categories were the customers buying previously, compared to now. They can also click on a certain category to have a chord chart for that specific ticket type. Chord diagram is an interactive diagram and it is crucial to balance it with simplicity so that users will not be overwhelmed by the charts. Thus, on the right side, simple and intuitive methods were used to display other relevant variables that are unique to that product type.
The use of simple comparison tools and exploration of better representing comparison methods - Sankey diagrams, and chord diagrams in the tabs made analysing and exploring cross-sectional data more intuitive and insightful.