Signal Proposal
Contents
- 1 PROBLEM & MOTIVATION
- 2 OBJECTIVES
- 3 SELECTED DATASETS
- 4 DATA PREPARATION
- 5 LITERATURE REVIEW
- 5.1 1. Network-constrained spatio-temporal clustering analysis of traffic collisions in Jinghan District of Wuhan, China
- 5.2 2. Visualizing Traffic Accident Hotspots Based on Spatial-Temporal Network Kernel Density Estimation
- 5.3 3. Analysis of data on the dataset of road accidents in Paris in 2012 and 2013
 
- 6 APPROACH
- 7 STORYBOARD
- 8 TOOLS & TECHNOLOGIES
- 9 APPLICATION OVERVIEW
- 10 INSIGHTS
- 11 KEY CHALLENGES
- 12 TIMELINE
- 13 REFERENCES
- 14 COMMENTS
PROBLEM & MOTIVATION 
Efforts by the Singapore Traffic Police in educating the public on road safety over the years have decreased the number of Fatal Accidents in Singapore by 15.7% in 2017 as compared to 2016 (Chua, 2018). Despite this improvement, accidents involving motorcyclists and elderly jaywalkers were highlighted as key concerns by the Singapore Traffic Police in 2017. This is because motorcycle accidents still accounts for more than half of the traffic accidents in 2017 and the number of elderly jaywalkers road fatalities are on the rise. As such, our project aims to uncover the spatial-point patterns of traffic accidents to provide some insights on how motorcycle accidents and traffic accidents involving elderly are affected by their surrounding space and time. Factors such as weather and time of accident would be introduced to investigate if such accidents tend to peak in certain districts and time.
To better derive insights from traffic accidents for our project, we will be using relevant datasets, mainly from Leeds City Council and Ordnance Survey. Leeds, Yorkshire town of close to 800,000 people, is home to Open Data Institute Leeds which was created to explore and deliver the potential of open innovation with data at city scale. In fact, despite Leeds being a small city in England, it is well-known for housing several data-heavy institutions, commercial enterprises and academia, all which contributed to the rich public data set that Leeds offers. Not surprising, Leeds is now a hub for data activity, with some businesses handling over 30 million data events daily to uncover consumer insights (Turner, 2018). Therefore, Leeds serves as an appropriate model for Singapore, a city-state aiming to derive people-centric solutions to address urban challenges, to emulate from.
In our project, we will be incorporating variables such as road junctions and location of shops into our analyses before linking it to Singapore. After analysing, we would also be recommending appropriate measures that could be put into place by the respective authorities, such as Singapore Traffic Police and Land Transport Authority. 
OBJECTIVES 
In our project, we would be creating geovisualisations that are able to achieve the following objectives:
- Gain an overview of areas with high traffic accidents intensity using Network-Constrained Kernel Density Estimation
- Identify zones which are more prone to accidents by highlighting clusters formed by Network-Constrained Ripley's K-Function
- Recommend additional datasets that could be collected for more in-depth analyses
SELECTED DATASETS 
The following datasets will be used for analysis, as elaborated below:
| Dataset | Format | Data Attribute | Source | 
|---|---|---|---|
| Leeds Road Traffic Accidents (2013 - 2017) | CSV | 
 | UK Open Database | 
| Leeds Pedestrian Crossing | CSV | 
 | Leeds Council Data Mill North | 
| Leeds School | CSV | 
 | UK Consumer Data Research Centre (CDRC) | 
| Local Authority Districts (Leeds) | SHP | UK Consumer Data Research Centre (CDRC) | |
| Leeds Road | SHP | UK Consumer Data Research Centre (CDRC) | |
| Leeds Motorway Junction | SHP | UK Consumer Data Research Centre (CDRC) | 
DATA PREPARATION 
Data Collection
- We will be merging 10 years of traffic accidents into a single data source for analysis
- Collating additional data set such as road network will aid us in our analysis
Data Cleaning and Wrangling
We will be performing the following steps to prepare our dataset for analysis:
- Remove duplicate rows
- Creating new date and time format variables using Lubridate package
- Year
- Month
- Time frame
- Season
 
Exploratory Data Analysis
We will be using R packages to perform EDA to better understand our dataset and to aid us in the conceptualization of our application.
LITERATURE REVIEW 
1. Network-constrained spatio-temporal clustering analysis of traffic collisions in Jinghan District of Wuhan, China
(By: Yaxin Fan, Xinyan Zhu, Bing She, Wei Guo, Tao Guo)
Aim of Study: To explore the spatio-temporal clustering patterns of traffic collisions by combining a set of network-constrained methods.
Methodology:
- Weighted network kernel density estimation
- Provides an intuitive way to incorporate attribute information
 
- Network cross K-function
- Shows that there are varying clustering tendencies between traffic collisions and different types of points of interests (POIs)
 
- Network differential Local Moran’s I and network local indicators of mobility association
- Provides straightforward and quantitative measures of the hotspot changes
 
Learning Points:
- Taking into consideration of contributory factors that may affect or correlate with the occurrence of traffic collisions (using network cross K-function to quantify spatial interrelationships between two types of point sets, coupled with Monte Carlo simulation method to test the distribution pattern of point events)
- Incorporated semantic information and temporal dimension of traffic collisions for comprehensive understanding of the spatio-temporal clustering patterns
- Considered POIs that might not directly relate to individual collisions but their spatial distribution might correlate with spatial distribution of collisions collectively
 
Areas for Improvement:
- Perhaps key implementation issues can be highlighted to allow readers to be aware of the potential drawbacks when implementing the mentioned algorithms, and to make necessary changes according to their context.
2. Visualizing Traffic Accident Hotspots Based on Spatial-Temporal Network Kernel Density Estimation
(By: Benjamin Romano, Zhe Jiang)
Aim of Study: Using Spatial-Temporal Network Kernel Density Estimation To Analyse Traffic Accidents in New York
Methodology:
- Spatio-temporal network kernel density estimation (STNKDE)
- In Network Kernel Density Estimation, edges are split into lixels. STNKDE extends the concept of lixel to include a temporal aspect.
 
Learning Points:
- Use of spatial-temporal methods helps to avoid over-allocation of resources during non-peak hours
Areas for Improvement:
- Explore the use of combining Getis-Ord GI* statistics to determine if hotspots are statistically significant
- Perhaps apply the spatial-temporal aspects to other relevant methods such as Weighted Kernel Density Estimation
3. Analysis of data on the dataset of road accidents in Paris in 2012 and 2013
http://www.remyzum.com/shiny/Paris_Accidentologie/ (By: Remy Zumbiehl)
Aim of Study: Exploratory data analysis of road accident in Paris
Methodology:
- Basic Mapping using Leaflet Package
Learning Points:
- Using different symbol or marker to present the severity of the road accidents
- Adding filters to drill down the dataset
- Types of vehicles
- Temporal Analysis - date range, the day of the week and time frame
 
Areas for Improvement:
- Add in geospatial analysis such as Kernel Density Estimation and Network Constrained Analysis
APPROACH 
After analysing several literature reviews, it is clearly inappropriate to use the standard point pattern analysis used for two-dimensional space. Our analyses need to take into account the geometry of the network and as such, we have to constrain statistical techniques such as K-Function to the network. We will first delve into uncovering which road segments have a higher intensity of road accidents followed by using various network-constrained techniques to investigate if there are statistically significant clusters.
1. Network-Constrained Kernel Density Estimation
Kernel Density Estimation involves estimating the probability density function of a variable, or in other words the density of features in a neighbourhood in geospatial terms. Investigating the average density of points along the network provides a quick insight on which segments of roads has higher intensity of traffic accidents.
There are two main types of intensities when estimating kernel densities in networks:
- Homogeneous intensity function has randomness that is characterised by complete spatial randomness. All points are independent and uniformly distributed in any given set.
- Inhomogeneous intensity function also has independence between disjoint sets but points are unevenly distributed according to their spatially varying intensity functions.
For inhomogeneous intensity function, kernel estimate of intensity has the form:
k(v,u) is the smoothing kernel to smooth out a point on the network, as shown below, in a book by Adrian et al (2016). The smoothing of the point is done with a Gaussian kernel and thickness of line proportional to the kernel value. This can be performed using the density.lpp function in R.
2. Second-Order Network K-Function (Network Constrained Ripley's K-Function)
After gaining insight on the density of traffic accidents, we would take the next step to uncover if the point patterns are dispersed, clustered or randomly distributed. This could be done using Network K-Function.
The Okabe-Yamada network K-Function defines the Network K-Function, by adapting the Ripley’s K Function through replacing the Euclidean distance with the shortest path distance. In this method, given a point v on the network, all locations in the network that can be reached from v by a path of length shorter than a radius r, defined by the user, would be considered. It is defined by the function below, with lambda(L) denoting the total length of the linear network :
While the above Network K-Function constrains points to networks, a second-order stationary point process is required for our analyses. The Okabe-Yamada network K-Function assumes that the network itself is homogeneous, which is not the case as different locations in the network is surrounded by different configurations of line segments. Network K-Functions obtained from different networks are not directly compatible in this case.
The second-order Network K-Function is proposed by Ang et al (2012) , known as the ‘geometrically corrected K-Function’. It is an extension of the Ripley’s K-Function’s benefits of enabling comparison between different point processes with different intensities, observed in different windows, combined with Okabe-Yamada network K-function. The geometrically corrected K-Function is defined by, for all r ≤ R, where u is any location on the network:
The above analysis can be computed in spatstat by the function linearK. It is noted that K-Function assumes homogeneity and Inhomogeneous geometrically K-Function is to be used if the underlying Poisson point process is inhomogeneous.
3. Correlation in Multitype Point Patterns Using Network Cross K-Function
Network K-Function deals with points of the same type while Network Cross-K Function is able to handle two different sets of points. For example, traffic accident points and location of shops could be used and Network Cross-K examines the distribution of accidents in terms of the shortest path from every accident to the nearest shop.
This can be implemented in R through the function linearKcross. The function also has variants for both homogenous and inhomogeneous Poisson point processes.
STORYBOARD 
TOOLS & TECHNOLOGIES 
Tools and technologies
Data Architecture
APPLICATION OVERVIEW 
INSIGHTS 
KEY CHALLENGES 
The following are some of the key technical challenges that we may face throughout the course of the project:
| Key Challenges | Mitigation Plan | 
|---|---|
| Unfamiliarity with spatial analysis methods | 
 | 
| Unfamiliarity with Rshiny | 
 | 
| Unfamiliarity with Leeds geographical area | 
 | 
TIMELINE 
REFERENCES 
- Chua, A. (2018, February 7). Fatal road accidents and fatalities hit all-time low in 2017: Traffic Police. Retrieved from https://www.todayonline.com/singapore/fatal-road-accidents-and-fatalities-hit-all-time-low-2017-traffic-police
- Fan, Y., Zhu, X., She, B., Guo, W. & Guo, T. (2018) Network-constrained spatio-temporal clustering analysis of traffic collisions in Jianghan District of Wuhan, China. PLoS ONE 13(4): e0195093. Retrieved from https://doi.org/10.1371/journal.pone.0195093
- Rui, Y., Yang, Z., Qian T., Khalid, S., Xia N. & Wang J. (2016) Network-constrained and category-based point pattern analysis for Suguo retail stores in Nanjing, China. Retrieved from https://doi.org/10.1080/13658816.2015.1080829
- Turner, A. (2018, November 13). How Big Data is Driving Innovation in the Leeds City Region. Retrieved from https://leeds-list.com/discussion/how-big-data-is-driving-innovation-in-the-leeds-city-region/
- Xie, Z. & Yan, J. (2008). Kernel Density Estimation of traffic accidents in a network space. Computers, Environment and Urban Systems, 32, 396-406. Retrieved from https://doi.org/10.1016/j.compenvurbsys.2008.05.001
- Yamada, I. & Thill, J. (2010). Local Indicators of Network-Constrained Clusters in Spatial Patterns Represented by a Link Attribute. Annals of the Association of American Geographers. 100. 269-285. Retrieved from https://doi.org/10.1080/00045600903550337
- Zumbiehl, R (2018, October 16). Accidentology in Paris. Retrieved from http://www.remyzum.com/shiny/Paris_Accidentologie/
COMMENTS 
Feel free to leave us some comments so that we can improve!
| No. | Name | Date | Comments | 
|---|---|---|---|
| 1. | Insert your name here | Insert date here | Insert comment here | 
| 2. | Insert your name here | Insert date here | Insert comment here | 
| 3. | Insert your name here | Insert date here | Insert comment here | 













