Group06 Elec3city Proposal

Project Motivation

Project Objective

Through our project, we aim to:

Data Preparation

Data	Source	Data Type
Average Monthly Household Electricity Consumption by Postal Code (Public Housing) & Dwelling Type, 2H 2016	ema.gov.sg	xls

Data Collection

All the data required for this project is readily available for download from either data.gov.sg or OpenStreetMap except for the accidents and heavy traffic data.

Collecting accidents and heavy traffic data

The accidents and heavy traffic data available from mytransport.sg are real time data which required API calling to retrieve the data. No historical accidents and heavy traffic data is available from mytransport.sg. Thus, in order to collect the data, we had to write a script on PowerShell that calls the API periodically to retrieve the JSON file containing the real-time data. Then, we wrote a script in PowerShell to convert the JSON file to a CSV file for ease of use.

We spent 5 weeks calling the API regularly to retrieve the real-time data and this gave us 335 accident points and 877 heavy traffic points, a sufficient quantity for analysis.

The data collected is in the format below:

Attributes	Example
Type	Accident
Latitude	1.319629
Longitude	103.8537
Date	22/2/2018
Time	10:33:00 PM
Description	Accident on CTE (towards SLE) after Moulmein Rd Exit with congestion till Kramat Rd Entrance. Avoid lanes 1 and 2.

With all the data ready, we can now proceed for data cleaning.

Data Cleaning

Extracting Expressway Networks

OSM Shape File

The shape file downloaded from OpenStreetMap gives us the entire road network of Singapore and some part of Malaysia. However, we only require the expressway road networks in Singapore. Thus, some data preparation is needed to extract the road networks that is needed. We have decided to perform this data preparation on QGIS as it gives us a better visualisation of the road network which allows us to easily detect any errors.

Firstly, we used the geoprocessing tool on QGIS to extract the road networks that only occur in Singapore. We performed the vector intersection function between the road networks layer and a layer containing the coastal outline of Singapore, which is downloaded from data.gov.sg. This returns us a layer consisting road networks that only occur in Singapore

Next, we performed filtering on the data frame to extract the expressway road networks. The data frame component of the shape files contains an attribute called ‘type’. We were able to obtain the expressway network by filtering ‘type’ = ‘motorway’.

800px

Lastly, we performed some manual check to remove some erroneous lines.

Extracting accidents and heavy traffic points that occur on expressway

The table above shows the attributes of the accidents and heavy traffic data. The points that we are only interested in are those that occur on the expressway. To obtain these points, we used R to perform our data cleaning.

  patterns <- c('on AYE','on BKE','on CTE','on ECP','on KJE','on KPE','on MCE','on PIE','on SLE','on TPE')

We could extract the expressway points by filtering the points that contain the expressway names in the ‘Description’ attribute. A ‘patterns’ variable is created to store the phrases that appear on expressway points.

  accidents_filter <- trafficReport %>% filter(grepl(paste(patterns, collapse="|"), Descriptions)) %>% filter(Type == 'Accident')

Lastly, we used the ‘grepl’ function in R to extract only the points that contain any of the phrases above in the ‘Description’ attribute.

Extracting the cameras

Similarly, we only require cameras that occur on expressways. This extraction is slightly more time consuming as there are no attributes in the cameras shape file which indicates whether or not the cameras are located on the expressway. Thus, to obtain the cameras that only occur on expressway, we did manual filtering on QGIS based on the expressway road network we have created previously. We repeated this step for the different types of cameras.

800px

Lastly, we combined all the cameras file together into a single shape file by using the join function.

Literature Review

To gain a better understanding of how we could proceed with our analysis, we decided to conduct a literature review. Here are the summaries of some research paper on spatial analysis of traffic accidents:

1. GIS-based spatial analysis of urban traffic accidents: Case study in Mashhad, Iran

Aim of study: to use geographic information technology (GIS) and spatial-statistical analysis to gain insights of the traffic accident patterns in Mashhad, Iran.

Results of kernel density level for accidents leading to injury from March 21, 2011 to March 19, 2012

Methodology:
1. Kernel Density Estimation

To determine static hotspots

2. Nearest Neighbour Distance Analysis

Used to determine if the accidents are clustered based on the nearest distance between two neighbouring accident points

3. K-function output analysis

Used to provide a more accurate analysis of points distribution

Learning Points:
1. Spatial Analysis Techniques

This study is similar to our project. Hence, we can learn the analysis technique they have used and apply it to our study
Similarly, we can use Kernel Density Estimation to detect traffic accident hotspots and Nearest Neighbour K function to determine if the accidents are randomly distributed or clustered

Areas for improvement:
1. Hard to follow up

As this analysis is done on a proprietary software (Arcview), it is impossible to reproduce the same study done by the researchers. Thus, it is hard for other researchers to follow up on their study.

2. Brazilian Road Traffic Fatalities: A Spatial and Environmental Analysis

Aim of study: to analyse road traffic accidents hotspots in BR 277 highway located in the state of Parana, southern Brazil and performed environmental analysis to identify patterns contributing to the traffic accidents.

File:Ref2.png

Kernel density and wavelet analysis hotspots. 3A) All Fatal Crashes

Methodology:
1. Kernel Density Estimation

To determine accident hotspots

2. Wavelet

Complement Kernel exploratory analysis

3. K-function output analysis

To reduce the variables into similar variance components
Then developed regression models to evaluate the impact of built environmental components on fatal crashes

Learning Points:

1. Spatial Analysis Techniques

Apart from using Kernel Density Estimation to develop hotspots as well as K function to determine complete spatial randomness like the previous study, this research also explores the impact of how the human built environment affects the occurrence of accidents.
We could possibly learn from this project how the built environment analysis is being executed and then determine how various infrastructures on the road affects the occurrence of accidents.

Areas for improvement:
1. Hard to follow up

Similar to the previous study, this analysis is done on a proprietary software (QGIS), it is impossible to reproduce the same study done by the researchers. Thus, it is hard for other researchers to follow up on their study.

3. IS415 2013-14 Assignment 2 – Heng U San

Aim of study: to analyse the distribution of GP Clinics, Preschools and Bus Stops in Bedok and provide recommendation on how amenities could be better planned.

File:USanPIC165.png

Density function for buildings

Methodology:
1. Nearest Neighbour Index

lpp function – to measure distance between points along a linear network

2. K-function

To determine the clustering type

Learning Points:

1. Clear and easy to understand

U San offered a very clear and easy to understand explanation of how Nearest Neighbour Index and K function works. This helped us significantly in understanding how these techniques are used in the other research papers.
U San’s work was well documented. She clearly explained the step by step procedure of how he obtained her results as well as the R functions used for analysis. This makes it much easier for other researchers to reproduce a similar study.
To analyse the spatial distribution of bus stops, U San included a road network constraint in the various analysis. This is done because bus stops can only occur on road networks. Similar to our study, accidents can only occur on road networks. Thus the road network constraint should be included in our analysis or else our result will not make sense.

Areas for improvement:
1. Sharing of codes

U San did well in documenting her step by step procedure, teaching other researchers to know how to reproduce a similar study. However, it will be even better if U San could share a R notebook of her codes so that researchers could reproduce the exact same study and continue her research from where she stopped.

Approach

After performing the literature review, we have a better understanding of what methodology could be used to achieve our objective. We then consulted our professor to decide the most appropriate analysis technique for use and finally we chose the techniques below.

Kernel Density Estimation with Network Constraints

800px

Kernel Density Estimation with Network Constraint is used to identify the location along the network which has a high concentration of traffic incidents. The formula for converting the observations into a Kernel Function is shown above. The bandwidth, T , can be adjusted to smooth out the Kernel Density Function. The Kernel Density Function with Network Constraint is executed in R using the spatstat package by applying the ‘density.lpp’ function on an lpp object.

Ripley's K Function with Network Constraints

The Ripley’s K Function is a spatial analysis method used to describe how point patterns occur are distributed over an area of interest. It allows us to determine if the point patterns are dispersed, clustered or randomly distributed. The formula above shows how we can obtain the K function given the observations. How K Function is used: 1. A circle of radius h is constructed around each observation 2. The number of observations that fall inside each circle is counted

800px

The formula is applied to obtain the K function at a radius h 4. The above 3 steps are repeated for different values of h. 5. A graph of K function against h is then plotted. 6. Monte Carlo simulation tests are then run to determine the K function of randomly distributed point patterns. 7. Compare the K Function of the observations with the K Function of the simulations. If The K function of the observation is higher than the upper bound of the simulations, it suggests that there are signs of clustering. On the other hand, if the K function of the observations is lower than the lower bound of the simulations, It suggests that there are signs of dispersion. Otherwise, if the K function of the observations is within the upper and lower boundary, it suggest that the points are in complete spatial randomness. Refer to the figure below for better illustration.

800px

However, a slight modification is added to our K function to include network constraints. This means that the circle of radius h will only expand along the road network instead of expanding freely.

K Function with linear constraint is executed in R with the spatstat package using the linearK function.

Multitype K Function with Network Constraints

The multitype K function is an extension of the Ripley’s K function. The algorithm is mostly the same, however instead of counting the number of same type observations in a circle with radius h, the number of observations belonging to the other type is counted. For example, a circle of radius h is formed around the traffic cameras and the number of accident points within this circle is counted. This step is repeated for all cameras and for a range of radius h. Lastly, the K function is plotted. The multitype K function can also be applied in R using the spatstat package with the linearKcross function.

Web Application Design

Design Inspiration

800px

Superzip R Shiny is a sample R Shiny web application found in the R Shiny gallery. What makes it unique from other R Shiny dashboard is that the two charts in the menu are dynamic. The analysis area of the two charts are dependent on the boundary of the map that the user is viewing. This allows user to set their area of analysis by zooming into the map and shifting it to the area of interest.

We could perhaps implement a similar feature in our R Shiny dashboard by allowing user to select their analysis area from the map. However, we will do the layout slightly differently from the example. Instead of having a floating menu, we will make the menu fixed to the side of the map as we feel that the floating menu will obstruct the user from seeing the entire analysis area.

Initial Storyboard

800px

Application Architecture

The image below shows the application architecture of our web application.

1000px

Application Overview

1000px

Type	Feature	Image	Purpose
Main	Upload Data	600px	Allows user to upload their own traffic data The format of the traffic data must be in a csv file and in the same format as the data collected from LTA’s API The format of the road network data must be in a shape file
	Toggle Map markers	600px	Allows user to toggle the respective markers on the map
	Select Analysis	600px	Allows user to select the analysis to perform
Kernel Density Estimation	Select KDE variables	600px	Select the variable for Kernel Density Estimation
	Slider	600px	The bandwidth of the Kernel Density plot. A larger kernel distance will lead to a smoother plot while a smaller kernel distance will lead to a plot with more noise
	Output options	600px	User can select if he/she wants to perform Kernel Density Estimation with or without network constraint
K Function	Select Analysis Variable	600px	Allows user to select the variable to perform K Function analysis
K Function	No. of Simulations	600px	Select the number of Monte Carlo Simulations to run for the K Function Analysis Higher number of simulations will require a longer loading time
Multitype K Function	Select Analysis Variable	600px	Allows user to select the variable to perform Multitype K Function Analysis
Multitype K Function	No. of Simulations	600px	Select the number of Monte Carlo Simulations to run for the Multitype K Function Analysis Higher number of simulations will require a longer loading time
Outputs	KDE with network constraints	600px	View of the KDE plot Legend of the KDE plot is indicated at the bottom left corner of the screen
	KDE without network constraints	600px	View of the KDE plot Legend of the KDE plot is indicated at the bottom left corner of the screen
	K Function	600px	Output of the K Function analysis
	Multitype K Function	600px	Output of the Multitype K Function analysis

Interesting Findings

The following section describes some of the interesting discussions from the R Shiny Web Application we have created.

We first performed a K Function analysis to determine if there is indeed clustering observed between accident points from a statistical point of view. A K Function analysis with 20 Monte Carlo Simulations is run.

1000px

Based on the K Function analysis as seen in the picture above, we can reject the null hypothesis at 95% confidence level (100 – 100/20) that the accident points are in complete spatial randomness and the accident points are indeed clustered between 200m to 5000m. Since we know that the accident points are clustered between 200m to 5000m, we can use any value between this range to plot the KDE output.

800px

600px

Using a Kernel Distance of 1500m, we have obtained the KDE plot as seen in the picture above. From the plot, 6 accident hotspots have been identified.
Next, we will move on to perform a KDE plot for heavy traffic. A K Function Analysis with 20 Monte Carlo Simulations is performed.

800px

Similarly, clear clustering is observed for heavy traffic points. From the K Function analysis above, we are 95 % confident that clustering is observed at any given distance.

800px

600px

Using a Kernel Distance of 1500m, we have obtained the following KDE plot of heavy traffic points. 5 heavy traffic hotspots have been identified.

We also performed a Multitype K Function analysis to determine if heavy traffic points tend to be clustered around heavy traffic points. Two traffic cameras caught our attention.

1000px

As seen in the picture above, the heavy traffic points tend to cluster around the two traffic cameras located at CTE between Bradell Road Exit and Ang Mo Kio Avenue 3 Exit. Based on the multitype K function analysis, which was run with 20 simulations, we are 95% confident that clustering indeed occurs around these traffic cameras.

From this result alone, we are unable to determine if this is just a coincidental correlation or if the traffic cameras are the cause of the heavy traffic, or if the traffic police deliberately placed the cameras there. However, this analysis does provide us with a starting point for further analysis the reasons for traffic congestion in this area. To verify if the traffic cameras are indeed causing the congestion, further ground work needs to be done.

Project Challenges

	Key Challenges	Description	Solution
1.	Lack of readily available data	There is currently no known data source that provides historical traffic accidents data in Singapore. There is only a real time API of traffic accidents from LTA.	Learn to write a script that perform autonomous calling of the API Create a regular schedule for the calling of API
2.	Unfamiliarity with R Shiny	We are unfamiliar with R programming language due to the lack of prior experience	Independent learning starting from week 5 Learning from each other Consult Prof Kam
3.	Unfamiliarity with spatial analysis techniques	We are unsure what spatial analysis techniques to use and how to apply it as we lack prior experience in geospatial analysis	Conduct literature review on the commonly used spatial analysis techniques Research how we these techniques are executed Independent learning on the analysis techniques from week 5 Learning from each other Consult Prof Kam

Project Timeline

1200px

Meet the Team

800px

From left to right: Gwee Wei Ling, Tan Ming Kwang, Prof Kam Tin Seong, Tan Zhi Chong (Vincent)

900px

Feel free to leave any comments! :)

No.	Name	Date	Comments
1.	Insert your Name here	Insert Date here	Insert Comment here
2.	Insert your Name here	Insert Date here	Insert Comment here
3.	Insert your Name here	Insert Date here	Insert Comment here

Group06 Elec3city Proposal

Contents

Project Motivation

Project Objective

Data Preparation

Data Collection

Collecting accidents and heavy traffic data

Data Cleaning

Extracting Expressway Networks

Extracting accidents and heavy traffic points that occur on expressway

Extracting the cameras

Literature Review

Approach

Kernel Density Estimation with Network Constraints

Ripley's K Function with Network Constraints

Multitype K Function with Network Constraints

Web Application Design

Design Inspiration

Initial Storyboard

Application Architecture

Application Overview

Interesting Findings

Project Challenges

Project Timeline

Meet the Team

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools