About
Discovering traffic patterns by using network graph visualisations
|
|
|
|
|
|
Abstract
Transportation networks are key lifelines that aid movement of people, goods, services and resources that are vital to the productivity of a nation. A good visualization of corridors along which vehicular transport moves is key in understanding patterns of such movement. Using a dataset that captures that timestamp information of vehicles passing through a wildlife preserve, a network visualization application is created using R Shiny as the platform. The insights derived can help understand metrics such as traffic density along corridors, the directions of traffic flow, and the daily and seasonal patterns of traffic flow.
Motivation
Network patterns can reveal very interesting insights but it is very difficult to implement with off-the-shelf software tools such as Tableau®. Gephi®, an open-source and free software is one of the leading tools to visualise network graphs. But, in order to make our findings easily accessible to everyone without any installation of any tools at their end, we propose the usage of the recently introduced ggraph package from R. Besides bringing the same kind of flexibility offered by a commercial tool, it offers an extension on the well-acclaimed ggplot2 package in R. Built specifically for supporting relational data structures such as networks, graphs and trees, the API provides a self-contained set of facets and customisations, enhancing the quality of visualisations.
Besides providing custom made visualisations to find out the traffic flow between various nodes in a predefined geography, the links that connect the nodes can also be represented with various measures of the user’s choice. In this case, the relevant choice is chosen to be traffic density. In addition to ggraph which provides a comprehensive network, we add an interactive version with the well known ggplotly which helps to provide interactivity to the nodes and edges, along with hoverable tooltips which give users a quick visual summary.
Practical use cases
In a country like Singapore with limited land space, the problems of congestion can be extremely severe. Congestion can bring about several repercussions such as the monetary value of time spent in traffic jams, increased fuel consumption resulting in air pollution, stressed and frustrated motorists leading to an increase occurrence of road rages incidents and higher chances of accidents due to reasons such as tailgating etc.
Singapore has several sources which provide real time traffic conditions. They include junction eyes, green link determining system (GLIDE), webcams and parking guidance system (PGS) etc. These data can be fed into our model to provide a traffic network map of Singapore. The sources collecting the data will be equivalent to the nodes. Using centrality measures to identify the nodes experiencing a higher volume of traffic, LTA can choose to install additional gantries there, raise existing ERP pricings to divert traffic or expand the road to accommodate a heavier volume of traffic and smoothen the flow.
About the Dataset
The data for designing the interactive application is obtained from the Visual Analytics Science and Technology Challenge, 2017. The dataset involves 4 attributes, namely the timestamp, car-id, car-type and gate-name. A snippet is as shown below. A particular car (Car-id) passing through a check point (Gate-name) is recorded at a particular instance of time (Timestamp) through an RFID tag. The Car-type indicates the type of car, where Car-type 2 indicates a 2-axle truck. A snippet of the dataset is as shown below.
Timestamp |
Car-id |
Car-type |
Gate-name |
2015-05-01 00:15:13 |
20151501121513-39 |
2 |
entrance4 |
2015-05-01 01:14:22 |
20155501015525-264 |
1 |
ranger-stop2 |
Devising a network graph visualisation needs a definition of nodes, edges and layouts. Nodes are entities that need to be connected, and in this case, the gate names serve as nodes. Edges help connect various nodes on a well-defined layout. Through a map image provided for this particular dataset, the layout has been pre-set to the respective Cartesian coordinates of each gate name. The edges here would represent the number of vehicles that follow the particular path between two gates. Deriving new variables from the timestamp information such as time of day, weeks and months can help the user visualise daily and seasonal patterns of traffic movement. Also, the gate names are aggregated into gate categories such as gates, entrances, etc.
An impetus for why this particular data set was chosen is because it provided us a predefined geography with a set of 40 gate names (such as camp1, camp2, entrance 1) which represent locations. The timestamp entry of a vehicle moving through such locations is obtained and the car type is also known. The closeness with which it can be associated to real world case is evidently seen.