ISSS608 2016 17T1 Group1 Report

From Visual Analytics and Applications
Revision as of 02:17, 28 November 2016 by Raymond.goh.2015 (talk | contribs)
Jump to navigation Jump to search
PROPOSAL   POSTER   APPLICATION   REPORT


Motivation of the Application

An Ezlink passenger travel record over 7 days has 50 millions + rows of data and 50+ variables. The raw data was too large for visualisation tools such as Tableau and D3.JS to read and process. Even loading the data into JMP proved to be a challenge and on average it took at least 30 mins to read in the csv data and convert into JMP format. Hence the ability to visualise the raw data was very limited.

The objective of the project was to prepare and manipulate the raw data into manageable data size for analysis on visualisation tools such as Tableau and D3.JS.

Review and Critic of Past Works

Data Exploration

The original Ezlink data (City nation ride data Full) had almost 50.7 millions rows of single trip transaction data. The ride data was captured from 15 to 21 Feb 2016, the week after Chinese New Year 2016. We could be assumed that the trend for bus ridership should be normal, and should not have have been influenced by the Chinese New Year holiday. The variables were comprised of 8 variables which captured the bus service number taken, boarding and alighitng date, time and bus stop numbers. The ride data was captured based on a single trip, starting from the bus stop that the commuter broaded from, on the date and time and align at the aligning bus stop on a given date and time. However the limitating on using bus stop number in our analysis would not be useful as the planners might not be able to visualise the graph. Hence location data would need to be used to supplement the City nation ride data.

Snapshot of City Ride Data Data

The next critical file is the location data for LTA bus stop mapping over to URA planning zones (lta-bus_stop_URA), which was provided for by Prof Kam. This set of data would allow us to merge with the ride data and give a location name to the start and end point of every ride, instead of bus stop number. We were able to decide he level of details that we want to look into and significantly reduce the number of rows of data and make the data more manageable on our laptop, without the aid of any server for processing of this huge data.

Snapshot of LTA Bus Stop Data Matching URA Planning Zones Data Variables

Sparklines in R

Sparklines are useful for a one glance visual inspection of multiple data streams over a time period and to detect any unusual patterns. The visual was created using Shiny library and Data Table in R. The arrangement of data using data table allowed the sparklines to be arranged in a neat table form. For our dataset, we would need to first bin each ride based on the start time travel, for example 1 bin equal 2 – 4 hours, depending on the number of bins that could give a better visual effect. Likewise, we would also need to create equal number of bins for the alighting time. So that this would allow us to visualise the flow out of the planning area and the flow into the planning area. Visualisation using R was initially explored as Bei Jia has a better understanding in R, and both of us were alien to D3.JS. Hence this visualisation method was one of the possible methods that we could explore.

Matthew Leonawicz (2010) Combining Data Tables And Sparklines

Chords Diagram in R

The next visual aid that we were exploring was the Chords Diagram using R Shiny. Chords diagram was a useful visual method in showing relationship between inflow and outflow. Hence Chord diagram is definitely a must for our visualisation. This set of work in R was useful as it explained in detail on how to construct the chord diagrams and what are the features that could be added in.

Zuguang Gu (2016) Visualize Relations by Chord Diagram

Chords Diagram in D3.JS

This Chords diagram was constructed in D3.JS, and the visual was used to explore the trade relationship between countries. The visual was contructed in a neat layout with filter function to filter the countries. This was a useful feature to explore any outlier or to exclude any countries of for analysing the visual output.

Steven Hall (2014) Interactive Chord Diagrams in D3

Design Framework

The Chords diagram in D3.JS was finally selected as the visual aid and the Interactive Chord Diagrams in D3 designed by Steven Hall was adored, as the design of the chords diagram was suitable for our needs. The design considerations for the visual aid was described as follows: main graph frame, date selection, countries filter and time period selection.

Visual Design Frame in D3

Data Preparation

The key to manage this huge dataset was to break down the data into “big bite” size for our laptop to process. It took the laptop (Intel® Core™ i7-5500U CPU @ 2.4GHz with 12.0 GB RAM) a good 10 minutes to read and process the CSV file (City nation ride data Full). The initial step was to filter the data by days into separate files, and smaller file size helped to speed up the processing of the data. After which the data was trasformed using JMP into the format required for our D3.JS programme. The initial data transformation took our team a good whole day to process the files and transform the data into the required format. Each file required as least 5 runs to merge and transform the data. Hence 7 files (7 days with 1 file to a day) required 35 runs. This data preparation process was subsequently simplify after Bei Jia had written an R programme to read in all the 7 files and merge with the bus stop data, before transform into the required format. The entire process for data transformation from the raw files into the required fomat for our experimentation was shorten into 1 hr with a single run of the R scrip using the same laptop. However, laptop with lesser RAM and slower processor speed might face a possiblities of lack of memory error.

Main Graph Frame

The Chord diagram was the main graph generated. The chord diagram could interactively transit based on the options selected. The chordDirective.js and matrixFactory.js scripts enabled the chord diagram to be drawn and transit between options.

Date Selection

Users could select to view the data by individual day, through selection option under Date Selection. The filtering was based on the variable as indicated in the picture below.

Filter by Date
Date Selection Radio Design in SG-Bus.html

Planning Areas Filter

Time Period Selection

Demonstration

Discussion

Future Work

Divya, Evie, Shreyas, Sonali (2013) BEERVIZ, Discover beers, & say cheers!

Installation guide

No installation is required.

Just download the folders from GitHub (link: https://github.com/BeiJiaKee/MITB_VisualAnalytics_SGBus) and unzip bower-components.zip.

Now you are ready to run!

ScreenShot of GitHub

User Guide

you may use any local web host you wish to.

Steps:

  1. Start up your local host (e.g. easy php)
  2. Navigate to the project folders to "MITB_VisualAnalytics_SGBus/demos/SG-Bus.html"
  3. Run SG-Bus.html
  4. On the dashboard produced, User may switch between dates and peak periods to explore the data
Dashboard VA.PNG

References