Difference between revisions of "Group06 Elec3city Proposal"
Yu.fu.2015 (talk | contribs) (Created page with "<br> <!------ Main Navigation Bar----> <center> {| style="background-color:#ffffff ; margin: 3px 10px 3px 10px; width="80%"| | style="font-family:Open Sans, Arial, sans-serif...") |
Yu.fu.2015 (talk | contribs) |
||
Line 22: | Line 22: | ||
</center> | </center> | ||
<!------- End of Main Navigation Bar----> | <!------- End of Main Navigation Bar----> | ||
+ | |||
+ | <br/> | ||
+ | == Project Motivation == | ||
+ | |||
+ | |||
+ | <div style="text-align: left; direction: ltr; margin-left: 1em;"> | ||
+ | <br/> | ||
+ | |||
+ | </div> | ||
+ | <br/> | ||
+ | == Project Objective == | ||
+ | |||
+ | <div style="text-align: left; direction: ltr; margin-left: 1em;"><strong>Through our project, we aim to: </strong> | ||
+ | # | ||
+ | # | ||
+ | # | ||
+ | # | ||
+ | </div> | ||
+ | <br/> | ||
+ | == Data Preparation == | ||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! Data !! Source !! Data Type | ||
+ | |- | ||
+ | |[https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/52RSU.xls Average Monthly Household Electricity Consumption by Postal Code (Public Housing) & Dwelling Type, 2H 2016] ||ema.gov.sg || xls | ||
+ | |- | ||
+ | | || || | ||
+ | |- | ||
+ | | || || | ||
+ | |- | ||
+ | | || || | ||
+ | |- | ||
+ | | || || | ||
+ | |- | ||
+ | | || || | ||
+ | |- | ||
+ | | || || | ||
+ | |- | ||
+ | | || || | ||
+ | |- | ||
+ | | || || | ||
+ | |- | ||
+ | | || || | ||
+ | |} | ||
+ | |||
+ | <br/> | ||
+ | === Data Collection === | ||
+ | |||
+ | All the data required for this project is readily available for download from either data.gov.sg or OpenStreetMap except for the accidents and heavy traffic data. | ||
+ | |||
+ | ==== Collecting accidents and heavy traffic data ==== | ||
+ | |||
+ | The accidents and heavy traffic data available from mytransport.sg are real time data which required API calling to retrieve the data. No historical accidents and heavy traffic data is available from mytransport.sg. Thus, in order to collect the data, we had to write a script on PowerShell that calls the API periodically to retrieve the JSON file containing the real-time data. Then, we wrote a script in PowerShell to convert the JSON file to a CSV file for ease of use. | ||
+ | |||
+ | We spent 5 weeks calling the API regularly to retrieve the real-time data and this gave us 335 accident points and 877 heavy traffic points, a sufficient quantity for analysis. | ||
+ | |||
+ | The data collected is in the format below: | ||
+ | |||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! Attributes !! Example | ||
+ | |- | ||
+ | | Type || Accident | ||
+ | |- | ||
+ | | Latitude || 1.319629 | ||
+ | |- | ||
+ | | Longitude || 103.8537 | ||
+ | |- | ||
+ | | Date || 22/2/2018 | ||
+ | |- | ||
+ | | Time || 10:33:00 PM | ||
+ | |- | ||
+ | | Description || Accident on CTE (towards SLE) after Moulmein Rd Exit with congestion till Kramat Rd Entrance. Avoid lanes 1 and 2. | ||
+ | |} | ||
+ | |||
+ | With all the data ready, we can now proceed for data cleaning. | ||
+ | |||
+ | <br/> | ||
+ | === Data Cleaning === | ||
+ | <br/> | ||
+ | ==== Extracting Expressway Networks ==== | ||
+ | [[File:Roadrunners datacleaning1.jpg|800px|frameless|center|OSM Shape File]] | ||
+ | |||
+ | The shape file downloaded from OpenStreetMap gives us the entire road network of Singapore and some part of Malaysia. However, we only require the expressway road networks in Singapore. Thus, some data preparation is needed to extract the road networks that is needed. We have decided to perform this data preparation on QGIS as it gives us a better visualisation of the road network which allows us to easily detect any errors. | ||
+ | |||
+ | Firstly, we used the geoprocessing tool on QGIS to extract the road networks that only occur in Singapore. We performed the vector intersection function between the road networks layer and a layer containing the coastal outline of Singapore, which is downloaded from data.gov.sg. This returns us a layer consisting road networks that only occur in Singapore | ||
+ | |||
+ | Next, we performed filtering on the data frame to extract the expressway road networks. The data frame component of the shape files contains an attribute called ‘type’. We were able to obtain the expressway network by filtering ‘type’ = ‘motorway’. | ||
+ | |||
+ | [[File:Roadrunners datacleaning2.jpg|800px|frameless|center]] | ||
+ | |||
+ | Lastly, we performed some manual check to remove some erroneous lines. | ||
+ | <br/> | ||
+ | ==== Extracting accidents and heavy traffic points that occur on expressway ==== | ||
+ | |||
+ | The table above shows the attributes of the accidents and heavy traffic data. The points that we are only interested in are those that occur on the expressway. To obtain these points, we used R to perform our data cleaning. | ||
+ | |||
+ | patterns <- c('on AYE','on BKE','on CTE','on ECP','on KJE','on KPE','on MCE','on PIE','on SLE','on TPE') | ||
+ | |||
+ | We could extract the expressway points by filtering the points that contain the expressway names in the ‘Description’ attribute. A ‘patterns’ variable is created to store the phrases that appear on expressway points. | ||
+ | |||
+ | accidents_filter <- trafficReport %>% filter(grepl(paste(patterns, collapse="|"), Descriptions)) %>% filter(Type == 'Accident') | ||
+ | |||
+ | Lastly, we used the ‘grepl’ function in R to extract only the points that contain any of the phrases above in the ‘Description’ attribute. | ||
+ | |||
+ | |||
+ | <br/> | ||
+ | ==== Extracting the cameras ==== | ||
+ | |||
+ | Similarly, we only require cameras that occur on expressways. This extraction is slightly more time consuming as there are no attributes in the cameras shape file which indicates whether or not the cameras are located on the expressway. Thus, to obtain the cameras that only occur on expressway, we did manual filtering on QGIS based on the expressway road network we have created previously. We repeated this step for the different types of cameras. | ||
+ | |||
+ | [[File:Roadrunners datacleaning3.jpg|800px|frameless|center]] | ||
+ | |||
+ | Lastly, we combined all the cameras file together into a single shape file by using the join function. | ||
+ | <br/> | ||
+ | == Literature Review == | ||
+ | |||
+ | <div style="text-align: left; direction: ltr; margin-left: 1em;"> | ||
+ | To gain a better understanding of how we could proceed with our analysis, we decided to conduct a literature review. Here are the summaries of some research paper on spatial analysis of traffic accidents: | ||
+ | <br/><br/> | ||
+ | <strong>1. GIS-based spatial analysis of urban traffic accidents: Case study in Mashhad, Iran</strong><br/><hr/> | ||
+ | <strong>Aim of study:</strong> to use geographic information technology (GIS) and spatial-statistical analysis to gain insights of the traffic accident patterns in Mashhad, Iran. <br/> | ||
+ | <br/><br/> | ||
+ | |||
+ | [[File:Kernel.png|200px|framed|center|Results of kernel density level for accidents leading to injury from March 21, 2011 to March 19, 2012]] | ||
+ | <br/> | ||
+ | |||
+ | <strong>Methodology:</strong><br/> | ||
+ | 1. Kernel Density Estimation | ||
+ | * To determine static hotspots | ||
+ | 2. Nearest Neighbour Distance Analysis | ||
+ | * Used to determine if the accidents are clustered based on the nearest distance between two neighbouring accident points | ||
+ | 3. K-function output analysis | ||
+ | * Used to provide a more accurate analysis of points distribution | ||
+ | <br/> | ||
+ | <strong>Learning Points:</strong><br/> | ||
+ | 1. Spatial Analysis Techniques | ||
+ | * This study is similar to our project. Hence, we can learn the analysis technique they have used and apply it to our study | ||
+ | * Similarly, we can use Kernel Density Estimation to detect traffic accident hotspots and Nearest Neighbour K function to determine if the accidents are randomly distributed or clustered | ||
+ | <br/> | ||
+ | <strong>Areas for improvement:</strong><br/> | ||
+ | 1. Hard to follow up | ||
+ | * As this analysis is done on a proprietary software (Arcview), it is impossible to reproduce the same study done by the researchers. Thus, it is hard for other researchers to follow up on their study. | ||
+ | <br/><br/> | ||
+ | |||
+ | |||
+ | <big><strong>2. Brazilian Road Traffic Fatalities: A Spatial and Environmental Analysis</strong><br/></big><hr/> | ||
+ | <strong>Aim of study:</strong> to analyse road traffic accidents hotspots in BR 277 highway located in the state of Parana, southern Brazil and performed environmental analysis to identify patterns contributing to the traffic accidents. <br/> | ||
+ | |||
+ | [[File:Ref2.png|200px|framed|center|Kernel density and wavelet analysis hotspots. 3A) All Fatal Crashes]] | ||
+ | <br/> | ||
+ | |||
+ | <strong>Methodology:</strong><br/> | ||
+ | 1. Kernel Density Estimation | ||
+ | * To determine accident hotspots | ||
+ | 2. Wavelet | ||
+ | * Complement Kernel exploratory analysis | ||
+ | 3. K-function output analysis | ||
+ | * To reduce the variables into similar variance components | ||
+ | * Then developed regression models to evaluate the impact of built environmental components on fatal crashes | ||
+ | <br/> | ||
+ | |||
+ | <strong>Learning Points:</strong><br/> | ||
+ | |||
+ | 1. Spatial Analysis Techniques | ||
+ | * Apart from using Kernel Density Estimation to develop hotspots as well as K function to determine complete spatial randomness like the previous study, this research also explores the impact of how the human built environment affects the occurrence of accidents. | ||
+ | * We could possibly learn from this project how the built environment analysis is being executed and then determine how various infrastructures on the road affects the occurrence of accidents. | ||
+ | |||
+ | <br/> | ||
+ | <strong>Areas for improvement:</strong><br/> | ||
+ | 1. Hard to follow up | ||
+ | * Similar to the previous study, this analysis is done on a proprietary software (QGIS), it is impossible to reproduce the same study done by the researchers. Thus, it is hard for other researchers to follow up on their study. | ||
+ | <br/><br/> | ||
+ | |||
+ | |||
+ | <big><strong>3. IS415 2013-14 Assignment 2 – Heng U San </strong><br/></big><hr/> | ||
+ | <strong>Aim of study:</strong> to analyse the distribution of GP Clinics, Preschools and Bus Stops in Bedok and provide recommendation on how amenities could be better planned. <br/> | ||
+ | |||
+ | [[File:USanPIC165.png|200px|framed|center|Density function for buildings]] | ||
+ | <br/> | ||
+ | |||
+ | <strong>Methodology:</strong><br/> | ||
+ | 1. Nearest Neighbour Index | ||
+ | * lpp function – to measure distance between points along a linear network | ||
+ | 2. K-function | ||
+ | * To determine the clustering type | ||
+ | <br/> | ||
+ | <strong>Learning Points:</strong><br/> | ||
+ | |||
+ | 1. Clear and easy to understand | ||
+ | * U San offered a very clear and easy to understand explanation of how Nearest Neighbour Index and K function works. This helped us significantly in understanding how these techniques are used in the other research papers. | ||
+ | * U San’s work was well documented. She clearly explained the step by step procedure of how he obtained her results as well as the R functions used for analysis. This makes it much easier for other researchers to reproduce a similar study. | ||
+ | * To analyse the spatial distribution of bus stops, U San included a road network constraint in the various analysis. This is done because bus stops can only occur on road networks. Similar to our study, accidents can only occur on road networks. Thus the road network constraint should be included in our analysis or else our result will not make sense. | ||
+ | |||
+ | <br/> | ||
+ | <strong>Areas for improvement:</strong><br/> | ||
+ | 1. Sharing of codes | ||
+ | * U San did well in documenting her step by step procedure, teaching other researchers to know how to reproduce a similar study. However, it will be even better if U San could share a R notebook of her codes so that researchers could reproduce the exact same study and continue her research from where she stopped. | ||
+ | <br/><br/> | ||
+ | </div> | ||
+ | <br/> | ||
+ | == Approach == | ||
+ | |||
+ | After performing the literature review, we have a better understanding of what methodology could be used to achieve our objective. We then consulted our professor to decide the most appropriate analysis technique for use and finally we chose the techniques below. | ||
+ | <br/> | ||
+ | === Kernel Density Estimation with Network Constraints === | ||
+ | |||
+ | [[File:Roadrunners approach1.jpg|800px|frameless|center]] | ||
+ | |||
+ | Kernel Density Estimation with Network Constraint is used to identify the location along the network which has a high concentration of traffic incidents. The formula for converting the observations into a Kernel Function is shown above. The bandwidth, T , can be adjusted to smooth out the Kernel Density Function. The Kernel Density Function with Network Constraint is executed in R using the spatstat package by applying the ‘density.lpp’ function on an lpp object. | ||
+ | <br/> | ||
+ | === Ripley's K Function with Network Constraints === | ||
+ | |||
+ | The Ripley’s K Function is a spatial analysis method used to describe how point patterns occur are distributed over an area of interest. It allows us to determine if the point patterns are dispersed, clustered or randomly distributed. The formula above shows how we can obtain the K function given the observations. | ||
+ | How K Function is used: | ||
+ | 1. A circle of radius h is constructed around each observation | ||
+ | 2. The number of observations that fall inside each circle is counted | ||
+ | <br/> | ||
+ | [[File:Roadrunners approach1.jpg|800px|frameless|center]] | ||
+ | <br/> | ||
+ | The formula is applied to obtain the K function at a radius h | ||
+ | 4. The above 3 steps are repeated for different values of h. | ||
+ | 5. A graph of K function against h is then plotted. | ||
+ | 6. Monte Carlo simulation tests are then run to determine the K function of randomly distributed point patterns. | ||
+ | 7. Compare the K Function of the observations with the K Function of the simulations. If The K function of the observation is higher than the upper bound of the simulations, it suggests that there are signs of clustering. On the other hand, if the K function of the observations is lower than the lower bound of the simulations, It suggests that there are signs of dispersion. Otherwise, if the K function of the observations is within the upper and lower boundary, it suggest that the points are in complete spatial randomness. Refer to the figure below for better illustration. | ||
+ | <br/> | ||
+ | [[File:Roadrunners approach3.jpg|800px|frameless|center]] | ||
+ | <br/> | ||
+ | However, a slight modification is added to our K function to include network constraints. This means that the circle of radius h will only expand along the road network instead of expanding freely. | ||
+ | |||
+ | K Function with linear constraint is executed in R with the spatstat package using the linearK function. | ||
+ | <br/> | ||
+ | === Multitype K Function with Network Constraints === | ||
+ | |||
+ | The multitype K function is an extension of the Ripley’s K function. The algorithm is mostly the same, however instead of counting the number of same type observations in a circle with radius h, the number of observations belonging to the other type is counted. For example, a circle of radius h is formed around the traffic cameras and the number of accident points within this circle is counted. This step is repeated for all cameras and for a range of radius h. Lastly, the K function is plotted. The multitype K function can also be applied in R using the spatstat package with the linearKcross function. | ||
+ | |||
+ | |||
+ | <br/> | ||
+ | == Web Application Design == | ||
+ | === Design Inspiration === | ||
+ | <br/> | ||
+ | [[File:Roadrunners designinsp1.jpg|800px|frameless|center]] | ||
+ | <br/> | ||
+ | Superzip R Shiny is a sample R Shiny web application found in the R Shiny gallery. What makes it unique from other R Shiny dashboard is that the two charts in the menu are dynamic. The analysis area of the two charts are dependent on the boundary of the map that the user is viewing. This allows user to set their area of analysis by zooming into the map and shifting it to the area of interest. | ||
+ | |||
+ | We could perhaps implement a similar feature in our R Shiny dashboard by allowing user to select their analysis area from the map. However, we will do the layout slightly differently from the example. Instead of having a floating menu, we will make the menu fixed to the side of the map as we feel that the floating menu will obstruct the user from seeing the entire analysis area. | ||
+ | |||
+ | === Initial Storyboard === | ||
+ | <br/> | ||
+ | [[File:Roadrunners initstory1.jpg|800px|frameless|center]] | ||
+ | <br/> | ||
+ | == Application Architecture == | ||
+ | |||
+ | The image below shows the application architecture of our web application. | ||
+ | <br/> | ||
+ | [[File:Roadrunners sa1.jpg|1000px|frameless|center]] | ||
+ | <br/> | ||
+ | == Application Overview == | ||
+ | |||
+ | [[File:Roadrunners app1.jpg|1000px|frameless|center]] | ||
+ | <br> | ||
+ | |||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! Type !! Feature !! Image !! Purpose | ||
+ | |- | ||
+ | | rowspan = "3" |Main || Upload Data || [[File:Roadrunners app2.jpg|600px|frameless|center]]|| | ||
+ | * Allows user to upload their own traffic data | ||
+ | * The format of the traffic data must be in a csv file and in the same format as the data collected from LTA’s API | ||
+ | * The format of the road network data must be in a shape file | ||
+ | |||
+ | |- | ||
+ | | Toggle Map markers || [[File:Roadrunners app3.jpg|600px|frameless|center]]|| | ||
+ | * Allows user to toggle the respective markers on the map | ||
+ | |- | ||
+ | | Select Analysis || [[File:Roadrunners app4.jpg|600px|frameless|center]]|| | ||
+ | * Allows user to select the analysis to perform | ||
+ | |- | ||
+ | | rowspan = "3" |Kernel Density Estimation || Select KDE variables || [[File:Roadrunners app5.jpg|600px|frameless|center]]|| | ||
+ | * Select the variable for Kernel Density Estimation | ||
+ | |- | ||
+ | | Slider || [[File:Roadrunners app6.jpg|600px|frameless|center]]|| | ||
+ | * The bandwidth of the Kernel Density plot. | ||
+ | * A larger kernel distance will lead to a smoother plot while a smaller kernel distance will lead to a plot with more noise | ||
+ | |||
+ | |- | ||
+ | | Output options || [[File:Roadrunners app7.jpg|600px|frameless|center]]|| | ||
+ | * User can select if he/she wants to perform Kernel Density Estimation with or without network constraint | ||
+ | |- | ||
+ | | rowspan = "2" |K Function || Select Analysis Variable || [[File:Roadrunners app8.jpg|600px|frameless|center]]|| | ||
+ | * Allows user to select the variable to perform K Function analysis | ||
+ | |- | ||
+ | | No. of Simulations || [[File:Roadrunners app9.jpg|600px|frameless|center]]|| | ||
+ | * Select the number of Monte Carlo Simulations to run for the K Function Analysis | ||
+ | * Higher number of simulations will require a longer loading time | ||
+ | |||
+ | |- | ||
+ | | rowspan = "2" | Multitype K Function || Select Analysis Variable || [[File:Roadrunners app10.jpg|600px|frameless|center]]|| | ||
+ | * Allows user to select the variable to perform Multitype K Function Analysis | ||
+ | |- | ||
+ | | No. of Simulations || [[File:Roadrunners app11.jpg|600px|frameless|center]]|| | ||
+ | * Select the number of Monte Carlo Simulations to run for the Multitype K Function Analysis | ||
+ | * Higher number of simulations will require a longer loading time | ||
+ | |||
+ | |- | ||
+ | | rowspan = "4" |Outputs || KDE with network constraints || [[File:Roadrunners app12.jpg|600px|frameless|center]]|| | ||
+ | * View of the KDE plot | ||
+ | * Legend of the KDE plot is indicated at the bottom left corner of the screen | ||
+ | |||
+ | |- | ||
+ | | KDE without network constraints || [[File:Roadrunners app13.jpg|600px|frameless|center]]|| | ||
+ | * View of the KDE plot | ||
+ | * Legend of the KDE plot is indicated at the bottom left corner of the screen | ||
+ | |||
+ | |- | ||
+ | | K Function || [[File:Roadrunners app14.jpg|600px|frameless|center]]|| | ||
+ | * Output of the K Function analysis | ||
+ | |- | ||
+ | | Multitype K Function || [[File:Roadrunners app15.jpg|600px|frameless|center]]|| | ||
+ | * Output of the Multitype K Function analysis | ||
+ | |} | ||
+ | <br/> | ||
+ | == Interesting Findings == | ||
+ | |||
+ | The following section describes some of the interesting discussions from the R Shiny Web Application we have created. | ||
+ | |||
+ | We first performed a K Function analysis to determine if there is indeed clustering observed between accident points from a statistical point of view. A K Function analysis with 20 Monte Carlo Simulations is run. | ||
+ | <br/><br/> | ||
+ | [[File:Roadrunners findings0a.jpg|1000px|frameless|center]] | ||
+ | <br/> | ||
+ | Based on the K Function analysis as seen in the picture above, we can reject the null hypothesis at 95% confidence level (100 – 100/20) that the accident points are in complete spatial randomness and the accident points are indeed clustered between 200m to 5000m. Since we know that the accident points are clustered between 200m to 5000m, we can use any value between this range to plot the KDE output. | ||
+ | <br/> | ||
+ | [[File:Roadrunners findings1.jpg|800px|frameless|center]] [[File:Roadrunners findings2.jpg|600px|frameless|center]] | ||
+ | <br/> | ||
+ | Using a Kernel Distance of 1500m, we have obtained the KDE plot as seen in the picture above. From the plot, 6 accident hotspots have been identified. | ||
+ | <br/> | ||
+ | Next, we will move on to perform a KDE plot for heavy traffic. A K Function Analysis with 20 Monte Carlo Simulations is performed. | ||
+ | <br/> | ||
+ | [[File:Roadrunners findings0b.jpg|800px|frameless|center]] | ||
+ | <br/> | ||
+ | Similarly, clear clustering is observed for heavy traffic points. From the K Function analysis above, we are 95 % confident that clustering is observed at any given distance. | ||
+ | <br/> | ||
+ | [[File:Roadrunners findings3.jpg|800px|frameless|center]] [[File:Roadrunners findings4.jpg|600px|frameless|center]] | ||
+ | <br/> | ||
+ | Using a Kernel Distance of 1500m, we have obtained the following KDE plot of heavy traffic points. 5 heavy traffic hotspots have been identified. | ||
+ | |||
+ | We also performed a Multitype K Function analysis to determine if heavy traffic points tend to be clustered around heavy traffic points. Two traffic cameras caught our attention. | ||
+ | <br/><br/> | ||
+ | |||
+ | [[File:Roadrunners findings5.jpg|1000px|frameless|center]] | ||
+ | <br/> | ||
+ | As seen in the picture above, the heavy traffic points tend to cluster around the two traffic cameras located at CTE between Bradell Road Exit and Ang Mo Kio Avenue 3 Exit. Based on the multitype K function analysis, which was run with 20 simulations, we are 95% confident that clustering indeed occurs around these traffic cameras. | ||
+ | |||
+ | From this result alone, we are unable to determine if this is just a coincidental correlation or if the traffic cameras are the cause of the heavy traffic, or if the traffic police deliberately placed the cameras there. However, this analysis does provide us with a starting point for further analysis the reasons for traffic congestion in this area. To verify if the traffic cameras are indeed causing the congestion, further ground work needs to be done. | ||
+ | |||
+ | <br/> | ||
+ | == Project Challenges == | ||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! !! Key Challenges !! Description !! Solution | ||
+ | |- | ||
+ | | 1. || Lack of readily available data || There is currently no known data source that provides historical traffic accidents data in Singapore. There is only a real time API of traffic accidents from LTA. || | ||
+ | * Learn to write a script that perform autonomous calling of the API | ||
+ | * Create a regular schedule for the calling of API | ||
+ | |- | ||
+ | | 2. || Unfamiliarity with R Shiny || We are unfamiliar with R programming language due to the lack of prior experience || | ||
+ | * Independent learning starting from week 5 | ||
+ | * Learning from each other | ||
+ | * Consult Prof Kam | ||
+ | |- | ||
+ | | 3. || Unfamiliarity with spatial analysis techniques || We are unsure what spatial analysis techniques to use and how to apply it as we lack prior experience in geospatial analysis || | ||
+ | * Conduct literature review on the commonly used spatial analysis techniques | ||
+ | * Research how we these techniques are executed | ||
+ | * Independent learning on the analysis techniques from week 5 | ||
+ | * Learning from each other | ||
+ | * Consult Prof Kam | ||
+ | |} | ||
+ | |||
+ | <br/><br/> | ||
+ | <br/> | ||
+ | == Project Timeline == | ||
+ | [[File:Roadrunners Timeline.png|1200px|frameless|center]] | ||
+ | <br/><br/> | ||
+ | |||
+ | == Meet the Team == | ||
+ | |||
+ | [[File:Roadrunners groupphoto.jpg|800px|frameless|center]] | ||
+ | <center>From left to right: Gwee Wei Ling, Tan Ming Kwang, Prof Kam Tin Seong, Tan Zhi Chong (Vincent)</center> | ||
+ | <br> | ||
+ | |||
+ | [[File:Comments.png|900px|frameless|center]] | ||
+ | <div style="text-align: center; direction: ltr; margin-left: 1em;"><font face="Avenir"><big>Feel free to leave any comments! :) </big></font></div> | ||
+ | |||
+ | {| class="wikitable" style="background-color:#FFFFFF;" width="100%" | ||
+ | |- | ||
+ | | | ||
+ | <b>No.</b> | ||
+ | || | ||
+ | <b>Name</b> | ||
+ | || | ||
+ | <b>Date</b> | ||
+ | || | ||
+ | <b>Comments</b> | ||
+ | |- | ||
+ | | | ||
+ | 1. | ||
+ | || | ||
+ | Insert your Name here | ||
+ | || | ||
+ | Insert Date here | ||
+ | || | ||
+ | Insert Comment here | ||
+ | |- | ||
+ | | | ||
+ | 2. | ||
+ | || | ||
+ | Insert your Name here | ||
+ | || | ||
+ | Insert Date here | ||
+ | || | ||
+ | Insert Comment here | ||
+ | |- | ||
+ | | | ||
+ | 3. | ||
+ | || | ||
+ | Insert your Name here | ||
+ | || | ||
+ | Insert Date here | ||
+ | || | ||
+ | Insert Comment here | ||
+ | |- | ||
+ | |} |
Revision as of 14:43, 25 February 2019
Contents
Project Motivation
Project Objective
Data Preparation
Data | Source | Data Type |
---|---|---|
Average Monthly Household Electricity Consumption by Postal Code (Public Housing) & Dwelling Type, 2H 2016 | ema.gov.sg | xls |
Data Collection
All the data required for this project is readily available for download from either data.gov.sg or OpenStreetMap except for the accidents and heavy traffic data.
Collecting accidents and heavy traffic data
The accidents and heavy traffic data available from mytransport.sg are real time data which required API calling to retrieve the data. No historical accidents and heavy traffic data is available from mytransport.sg. Thus, in order to collect the data, we had to write a script on PowerShell that calls the API periodically to retrieve the JSON file containing the real-time data. Then, we wrote a script in PowerShell to convert the JSON file to a CSV file for ease of use.
We spent 5 weeks calling the API regularly to retrieve the real-time data and this gave us 335 accident points and 877 heavy traffic points, a sufficient quantity for analysis.
The data collected is in the format below:
Attributes | Example |
---|---|
Type | Accident |
Latitude | 1.319629 |
Longitude | 103.8537 |
Date | 22/2/2018 |
Time | 10:33:00 PM |
Description | Accident on CTE (towards SLE) after Moulmein Rd Exit with congestion till Kramat Rd Entrance. Avoid lanes 1 and 2. |
With all the data ready, we can now proceed for data cleaning.
Data Cleaning
Extracting Expressway Networks
The shape file downloaded from OpenStreetMap gives us the entire road network of Singapore and some part of Malaysia. However, we only require the expressway road networks in Singapore. Thus, some data preparation is needed to extract the road networks that is needed. We have decided to perform this data preparation on QGIS as it gives us a better visualisation of the road network which allows us to easily detect any errors.
Firstly, we used the geoprocessing tool on QGIS to extract the road networks that only occur in Singapore. We performed the vector intersection function between the road networks layer and a layer containing the coastal outline of Singapore, which is downloaded from data.gov.sg. This returns us a layer consisting road networks that only occur in Singapore
Next, we performed filtering on the data frame to extract the expressway road networks. The data frame component of the shape files contains an attribute called ‘type’. We were able to obtain the expressway network by filtering ‘type’ = ‘motorway’.
Lastly, we performed some manual check to remove some erroneous lines.
Extracting accidents and heavy traffic points that occur on expressway
The table above shows the attributes of the accidents and heavy traffic data. The points that we are only interested in are those that occur on the expressway. To obtain these points, we used R to perform our data cleaning.
patterns <- c('on AYE','on BKE','on CTE','on ECP','on KJE','on KPE','on MCE','on PIE','on SLE','on TPE')
We could extract the expressway points by filtering the points that contain the expressway names in the ‘Description’ attribute. A ‘patterns’ variable is created to store the phrases that appear on expressway points.
accidents_filter <- trafficReport %>% filter(grepl(paste(patterns, collapse="|"), Descriptions)) %>% filter(Type == 'Accident')
Lastly, we used the ‘grepl’ function in R to extract only the points that contain any of the phrases above in the ‘Description’ attribute.
Extracting the cameras
Similarly, we only require cameras that occur on expressways. This extraction is slightly more time consuming as there are no attributes in the cameras shape file which indicates whether or not the cameras are located on the expressway. Thus, to obtain the cameras that only occur on expressway, we did manual filtering on QGIS based on the expressway road network we have created previously. We repeated this step for the different types of cameras.
Lastly, we combined all the cameras file together into a single shape file by using the join function.
Literature Review
To gain a better understanding of how we could proceed with our analysis, we decided to conduct a literature review. Here are the summaries of some research paper on spatial analysis of traffic accidents:
Aim of study: to use geographic information technology (GIS) and spatial-statistical analysis to gain insights of the traffic accident patterns in Mashhad, Iran.
Methodology:
1. Kernel Density Estimation
- To determine static hotspots
2. Nearest Neighbour Distance Analysis
- Used to determine if the accidents are clustered based on the nearest distance between two neighbouring accident points
3. K-function output analysis
- Used to provide a more accurate analysis of points distribution
Learning Points:
1. Spatial Analysis Techniques
- This study is similar to our project. Hence, we can learn the analysis technique they have used and apply it to our study
- Similarly, we can use Kernel Density Estimation to detect traffic accident hotspots and Nearest Neighbour K function to determine if the accidents are randomly distributed or clustered
Areas for improvement:
1. Hard to follow up
- As this analysis is done on a proprietary software (Arcview), it is impossible to reproduce the same study done by the researchers. Thus, it is hard for other researchers to follow up on their study.
Aim of study: to analyse road traffic accidents hotspots in BR 277 highway located in the state of Parana, southern Brazil and performed environmental analysis to identify patterns contributing to the traffic accidents.
Methodology:
1. Kernel Density Estimation
- To determine accident hotspots
2. Wavelet
- Complement Kernel exploratory analysis
3. K-function output analysis
- To reduce the variables into similar variance components
- Then developed regression models to evaluate the impact of built environmental components on fatal crashes
Learning Points:
1. Spatial Analysis Techniques
- Apart from using Kernel Density Estimation to develop hotspots as well as K function to determine complete spatial randomness like the previous study, this research also explores the impact of how the human built environment affects the occurrence of accidents.
- We could possibly learn from this project how the built environment analysis is being executed and then determine how various infrastructures on the road affects the occurrence of accidents.
Areas for improvement:
1. Hard to follow up
- Similar to the previous study, this analysis is done on a proprietary software (QGIS), it is impossible to reproduce the same study done by the researchers. Thus, it is hard for other researchers to follow up on their study.
Aim of study: to analyse the distribution of GP Clinics, Preschools and Bus Stops in Bedok and provide recommendation on how amenities could be better planned.
Methodology:
1. Nearest Neighbour Index
- lpp function – to measure distance between points along a linear network
2. K-function
- To determine the clustering type
Learning Points:
1. Clear and easy to understand
- U San offered a very clear and easy to understand explanation of how Nearest Neighbour Index and K function works. This helped us significantly in understanding how these techniques are used in the other research papers.
- U San’s work was well documented. She clearly explained the step by step procedure of how he obtained her results as well as the R functions used for analysis. This makes it much easier for other researchers to reproduce a similar study.
- To analyse the spatial distribution of bus stops, U San included a road network constraint in the various analysis. This is done because bus stops can only occur on road networks. Similar to our study, accidents can only occur on road networks. Thus the road network constraint should be included in our analysis or else our result will not make sense.
Areas for improvement:
1. Sharing of codes
- U San did well in documenting her step by step procedure, teaching other researchers to know how to reproduce a similar study. However, it will be even better if U San could share a R notebook of her codes so that researchers could reproduce the exact same study and continue her research from where she stopped.
Approach
After performing the literature review, we have a better understanding of what methodology could be used to achieve our objective. We then consulted our professor to decide the most appropriate analysis technique for use and finally we chose the techniques below.
Kernel Density Estimation with Network Constraints
Kernel Density Estimation with Network Constraint is used to identify the location along the network which has a high concentration of traffic incidents. The formula for converting the observations into a Kernel Function is shown above. The bandwidth, T , can be adjusted to smooth out the Kernel Density Function. The Kernel Density Function with Network Constraint is executed in R using the spatstat package by applying the ‘density.lpp’ function on an lpp object.
Ripley's K Function with Network Constraints
The Ripley’s K Function is a spatial analysis method used to describe how point patterns occur are distributed over an area of interest. It allows us to determine if the point patterns are dispersed, clustered or randomly distributed. The formula above shows how we can obtain the K function given the observations.
How K Function is used:
1. A circle of radius h is constructed around each observation
2. The number of observations that fall inside each circle is counted
The formula is applied to obtain the K function at a radius h
4. The above 3 steps are repeated for different values of h.
5. A graph of K function against h is then plotted.
6. Monte Carlo simulation tests are then run to determine the K function of randomly distributed point patterns.
7. Compare the K Function of the observations with the K Function of the simulations. If The K function of the observation is higher than the upper bound of the simulations, it suggests that there are signs of clustering. On the other hand, if the K function of the observations is lower than the lower bound of the simulations, It suggests that there are signs of dispersion. Otherwise, if the K function of the observations is within the upper and lower boundary, it suggest that the points are in complete spatial randomness. Refer to the figure below for better illustration.
However, a slight modification is added to our K function to include network constraints. This means that the circle of radius h will only expand along the road network instead of expanding freely.
K Function with linear constraint is executed in R with the spatstat package using the linearK function.
Multitype K Function with Network Constraints
The multitype K function is an extension of the Ripley’s K function. The algorithm is mostly the same, however instead of counting the number of same type observations in a circle with radius h, the number of observations belonging to the other type is counted. For example, a circle of radius h is formed around the traffic cameras and the number of accident points within this circle is counted. This step is repeated for all cameras and for a range of radius h. Lastly, the K function is plotted. The multitype K function can also be applied in R using the spatstat package with the linearKcross function.
Web Application Design
Design Inspiration
Superzip R Shiny is a sample R Shiny web application found in the R Shiny gallery. What makes it unique from other R Shiny dashboard is that the two charts in the menu are dynamic. The analysis area of the two charts are dependent on the boundary of the map that the user is viewing. This allows user to set their area of analysis by zooming into the map and shifting it to the area of interest.
We could perhaps implement a similar feature in our R Shiny dashboard by allowing user to select their analysis area from the map. However, we will do the layout slightly differently from the example. Instead of having a floating menu, we will make the menu fixed to the side of the map as we feel that the floating menu will obstruct the user from seeing the entire analysis area.
Initial Storyboard
Application Architecture
The image below shows the application architecture of our web application.
Application Overview
Type | Feature | Image | Purpose |
---|---|---|---|
Main | Upload Data |
| |
Toggle Map markers |
| ||
Select Analysis |
| ||
Kernel Density Estimation | Select KDE variables |
| |
Slider |
| ||
Output options |
| ||
K Function | Select Analysis Variable |
| |
No. of Simulations |
| ||
Multitype K Function | Select Analysis Variable |
| |
No. of Simulations |
| ||
Outputs | KDE with network constraints |
| |
KDE without network constraints |
| ||
K Function |
| ||
Multitype K Function |
|
Interesting Findings
The following section describes some of the interesting discussions from the R Shiny Web Application we have created.
We first performed a K Function analysis to determine if there is indeed clustering observed between accident points from a statistical point of view. A K Function analysis with 20 Monte Carlo Simulations is run.
Based on the K Function analysis as seen in the picture above, we can reject the null hypothesis at 95% confidence level (100 – 100/20) that the accident points are in complete spatial randomness and the accident points are indeed clustered between 200m to 5000m. Since we know that the accident points are clustered between 200m to 5000m, we can use any value between this range to plot the KDE output.
Using a Kernel Distance of 1500m, we have obtained the KDE plot as seen in the picture above. From the plot, 6 accident hotspots have been identified.
Next, we will move on to perform a KDE plot for heavy traffic. A K Function Analysis with 20 Monte Carlo Simulations is performed.
Similarly, clear clustering is observed for heavy traffic points. From the K Function analysis above, we are 95 % confident that clustering is observed at any given distance.
Using a Kernel Distance of 1500m, we have obtained the following KDE plot of heavy traffic points. 5 heavy traffic hotspots have been identified.
We also performed a Multitype K Function analysis to determine if heavy traffic points tend to be clustered around heavy traffic points. Two traffic cameras caught our attention.
As seen in the picture above, the heavy traffic points tend to cluster around the two traffic cameras located at CTE between Bradell Road Exit and Ang Mo Kio Avenue 3 Exit. Based on the multitype K function analysis, which was run with 20 simulations, we are 95% confident that clustering indeed occurs around these traffic cameras.
From this result alone, we are unable to determine if this is just a coincidental correlation or if the traffic cameras are the cause of the heavy traffic, or if the traffic police deliberately placed the cameras there. However, this analysis does provide us with a starting point for further analysis the reasons for traffic congestion in this area. To verify if the traffic cameras are indeed causing the congestion, further ground work needs to be done.
Project Challenges
Key Challenges | Description | Solution | |
---|---|---|---|
1. | Lack of readily available data | There is currently no known data source that provides historical traffic accidents data in Singapore. There is only a real time API of traffic accidents from LTA. |
|
2. | Unfamiliarity with R Shiny | We are unfamiliar with R programming language due to the lack of prior experience |
|
3. | Unfamiliarity with spatial analysis techniques | We are unsure what spatial analysis techniques to use and how to apply it as we lack prior experience in geospatial analysis |
|
Project Timeline
Meet the Team
No. |
Name |
Date |
Comments |
1. |
Insert your Name here |
Insert Date here |
Insert Comment here |
2. |
Insert your Name here |
Insert Date here |
Insert Comment here |
3. |
Insert your Name here |
Insert Date here |
Insert Comment here |