ANLY482 AY2017-18T2 Group08 : Project Findings / Final

From Analytics Practicum
Revision as of 17:21, 16 April 2018 by Shawn.pang.2014 (talk | contribs)
Jump to navigation Jump to search

Homepage

Our Team

Project Overview

Project Findings

Project Management

Documentation

Other AY2017-18 T2 Projects

Interim Final

1.0 Introduction and Project Background

In today’s world, the convenient ad-hoc access provided by digital systems is taking the place of the assured access once offered by personal ownership (The Economist, 2017). For instance, streaming beats records, cloud-system beats hard disk; credit beats cash. A similar phenomenon is occurring in the transportation industry, with the introduction of bike-sharing. Bike-sharing programs have existed for almost 50 years, but in the last decade, there has been a sharp increase in both their prevalence and popularity worldwide (Fishman, Elliot, Washington, Simon, & Haworth, 2013). Bike-sharing is a sustainable mobility strategy developed in response to concerns regarding global climate change, energy security and unstable fuel prices (Shaheen, Guzman & Zhang, 2010). Although China is currently the world leader in bike-sharing schemes, it is observed that many countries including France, Europe and USA have begun adopting this model as well (Gray, 2017).

However, despite the good and convenience that bike-sharing have introduced, there have also been downsides to it. For instance, complaints of reckless riding and bad parking have stuck a wrench in the bike-sharing movement (Lim, 2017). Authorities had little choice but to step in and issue new regulations to minimise the “bad behaviour” common among bike-sharing users.


1.1 Motivations and Objectives
Majority of existing research surrounding the bike-sharing movement consists of studies conducted with two goals in mind:
1. Understanding business profitability and sustainability concerns
2. Gathering insights on bicycle routes taken by individuals to offer guidance to urban planners, policy makers and transportation practitioners


Little or no research has yet been done to shed light on the increasingly prominent issue of illegal parking patterns. Henceforth, this paper seeks to explore this further with the following objectives in mind:
1. Fill existing research gap by exploring the use of ‘Spatial Point Pattern Analysis’ in analyzing clustering patterns of illegal bike-parking
2. Specifically demonstrate the use of KDE and modified-L-function
3. Apply the tools to a real-life case study tools based on a case study of Singapore
4. Discuss key learning points and considerations in using the methods

2.0 Literature Review

Literatures on Kernel Density Estimation (KDE) and L-function were explored and reviewed in preparation of this research. Existing literatures showed that KDE is well-suited for analyzing spatial patterns, especially when there is a need to examine the intensity of a particular phenomenon. In the paper “Spatial distribution of diagnosed chronic kidney disease (CKD) in Edo State, Nigeria”, KDE was used to investigate spatial distribution of CKD across regions in Nigeria. The study was important because health outcomes generally involve people, thus the population at risk of CKD had to be determined. Studying the spatial patterns reflects the spatial distribution of the underlying population (Carlos et al. 2010), thus allowing the team to zero into the identified regions through the use of KDE. In relation to this paper, KDE will also be adopted in identifying locations with high intensity of clustering.

The second paper “The Utility of Hotspot Mapping for Predicting Spatial Patterns of Crime” presented the usefulness and accuracy of KDE in predictive SPPA. It compared various mapping techniques such as point-mapping, thematic-mapping of geographic areas (e.g. census areas), spatial ellipses and KDE, and identifies the one that most accurately predicts future crime occurrences (Chainey et al., 2008a). It split ‘crimes’ into four categories, after-which the different techniques were applied on them to identify the technique that best predicted future crime occurrence. It was found that KDE consistently outperformed all other techniques in its predictive capabilities for all the different crime types studied. Also, data used in this paper were geocoded crime data-points, in which coordinates were rounded off to the nearest 10m. This supposedly reduces the accuracy of the data-points as a crime could have been displaced by up to 5m in any direction of the actual location. However, it was concluded that small differences in locations of crime occurrence would not negatively impact the study’s findings. This is a useful for our paper as it illustrates that research of this nature should not be sensitive to small inaccuracies of the geographical coordinates used.

Some literatures also highlighted certain inherent limitations, one of which is that KDE is unable to show the distance where spatial patterns become significant. The paper, “Identification of hazardous road locations of traffic accidents by means of KDE and cluster significance evaluation”, explored the use of KDE in determining areas with a high potential of road traffic accidents. More importantly, it also introduced the ‘Monte-Carlo Simulation’, a statistical technique that uses repeated random simulations to determine properties of event and their significance level. By combining both techniques, it allowed the researcher to identify the clusters of traffic accident that are statistically significant. Thus, to ensure the accuracy of our study, the L-function and ‘bw.diggle’, a function in R studio’s ‘spatstat’ package will be introduced to determine an appropriate kernel size for the KDE analysis. In addition, the Monte-Carlo Simluation will be adopted to ensure that the kernel size is statistically significant. More will be discussed in the next section.


3.0 Spatial Point Pattern Analysis Methods

3.1 Kernel Density Estimation

3.1.1 Origin of Kernel Density Estimation

3.1.2 The Kernel Density Estimation Function

3.1.3 Hotspot Mapping Using Kernel Density Estimation Function

3.2 Ripley's K Function

3.2.1 Interpretation of Ripley's K Function Function

Figure 3: Visual Representation of East Coast Park Singapore

3.3 L-Function: Derivative of Ripley's K Function

3.4 'Bw.Diggle' Function, An Alternative Method To Approximate A Kernel Bandwidth

4.0 Case Study of Singapore and a Bike-sharing Firm

4.1 Dataset Description

4.1.1 Geocoding Process

4.1.2 Classification of Locations Based on Certainty Level


5.0 Application of Geo-spatial Point Pattern Analytical Methods on A Case Study of Singapore

5.1 Narrowing of Study Area Using QGIS Choropleth Map

5.2 Determining Spatial Patterns Using Spatstat's Modified L-Function on R Studio

5.2.1 Plotting The Modified L-Function Graph on 'R Studio'

5.3 Obtaining the Modified L-Function PLot and Kernel Radius

5.3.1 Optimal Kernel Density Radius Obtained Using Spatstat's 'bw.diggle' function

6.0 Findings and Analysis

6.1 Comparison of Bedok and Jurong West Using KDE on QGIS

6.1.1 Bedok Subzone Heatmap Analysis


6.1.2 Jurong-West Subzone Heatmap Analysis


6.2 Evaluating Placement of Yellow-boxes

6.3 Analysis of Illegal Bike-Parking Patterns in Bedok by Time Period

6.1.1 Bedok Subzone Heatmap Analysis
6.1.1 Bedok Subzone Heatmap Analysis
6.1.1 Bedok Subzone Heatmap Analysis

7.0 Conclusion | Key Takeaways and Considerations

8.0 References

Anderson, T. (2009). Kernel density estimation and K-means clustering to profile road accident hotspots. Accident Analysis and Prevention, 41(3), 359-364.

Bw.diggle function. (n.d.). Retrieved April 8, 2018, from https://www.rdocumentation.org/packages/spatstat/versions/1.55-0/topics/bw.diggle

Bíl, Andrášik, & Janoška. (2013). Identification of hazardous road locations of traffic accidents by means of kernel density estimation and cluster significance evaluation. Accident Analysis and Prevention, 55, 265-273

Chainey, S., Tompson, L., & Uhlig, S. (n.d.). The Utility of Hotspot Mapping for Predicting Spatial Patterns of Crime. Retrieved April 8, 2018, from https://www.e-education.psu.edu/geog884/sites/www.e-education.psu.edu.geog884/files/image/lesson2/Chainey et al. (2008).pdf

China's ‘Uber for bikes’ model is going global. Retrived from https://www.weforum.org/agenda/2017/06/china-leads-the-world-in-bike-sharing-and-now-its-uber-for-bikes-model-is-going-global/

Chapter 11 Point Pattern Analysis / Github https://mgimond.github.io/Spatial/point-pattern-analysis.html Diggle, Peter. (1985). A Kernel Method for Smoothing Point Process Data. Applied Statistics. 34. 138-147. 10.2307/2347366.

Dixon, Philip M., "Ripley’s K function" (2001). Statistics Preprints. 52. http://lib.dr.iastate.edu/stat_las_preprints/52

Gesler, W. (1986). The uses of spatial analysis in medical geography: A review. Social Science & Medicine, 23(10), 963-973.

Hashimoto, Yoshiki, Saeki, Mimura, Ando, & Nanba. (2016). Development and application of traffic accident density estimation models using kernel density estimation. Journal of Traffic and Transportation Engineering (English Edition), 3(3), 262-270.

Kiskowski, & Hancock, & Kenworthy. (2009, May) On the Use of Ripley's K-Function and Its Derivatives to Analyze Domain Size. Retrieved from, http://www.cell.com/biophysj/abstract/S0006-3495(09)01048-0

Kobylińska, K., Cellmer, R., Źróbek, S., & Lepkova, N. (2017). Using Kernel density estimation for modelling and simulating transaction location. International Journal of Strategic Property Management, 21(1), 29-40. Li, Wei, Huang & Ye. (2008). Spatial patterns and interspecific associations of three canopy species at different life stages in a subtropical forest, China. Retrieved from, http://www.jipb.net/tupian/2008/3/18/163001.pdf

Lim, Kenneth. (2017) Bike-sharing in Singapore: A look at the road ahead. The Channel News Asia. Retrieved from. https://www.channelnewsasia.com/news/singapore/bike-sharing-in-singapore-a-look-at-the-road-ahead-8867898

Minoiu, C., & Reddy, S. (2008). Kernel density estimation based on grouped data : The case of poverty assessment , Washington, District of Columbia : International Monetary Fund (IMF working paper ; WP/08/183).

Ripley’s K function Philip M. Dixon Volume 3, pp 1796–1803 in Encyclopedia of Environmetrics (ISBN 0471 899976) https://www3.nd.edu/~mhaenggi/ee87021/Dixon-K-Function.pdf

Silverman, B. (1978). Weak and Strong Uniform Consistency of the Kernel Estimate of a Density and its Derivatives. The Annals of Statistics, 6(1), 177-184.

Shaheen, S., Guzman, S., & Zhang, H. (2010). Bikesharing in Europe, the Americas, and Asia: Past, Present, and Future. Spencer, J., & Angeles, G. (2007). Kernel density estimation as a technique for assessing availability of health services in Nicaragua. Health Services and Outcomes Research Methodology, 7(3), 145-157.

Silverman, B.W. (2012, Mar) DENSITY ESTIMATION FOR STATISTICS AND DATA ANALYSIS B.W. Silverman. Retrieved from, https://ned.ipac.caltech.edu/level5/March02/Silverman/paper.pdf

Tania L King, Lukar E Thornton, Rebecca J Bentley, & Anne M Kavanagh. (n.d.). The Use of Kernel Density Estimation to Examine Associations between Neighborhood Destination Intensity and Walking and Physical Activity. PLoS ONE, 10(9), E0137402.

The Economist. (2017, Dec 19). How bike-sharing conquered the world. Retrieved from, https://www.economist.com/news/christmas-specials/21732701-two-wheeled-journey-anarchist-provocation-high-stakes-capitalism-how

Turlach, Berwin. (1999). Bandwidth Selection in Kernel Density Estimation: A Review. Technical Report.

Xun Shi (2010) Selection of bandwidth type and adjustment side in kernel density estimation over inhomogeneous backgrounds, International Journal of Geographical Information Science, 24:5, 643-660, DOI: 10.1080/13658810902950625

Zambom, A., & Dias, R. (2012). A Review of Kernel Density Estimation with Applications to Econometrics.

9.0 Acknowledgement

We would to graciously thank Professor Kam Tin Seong (Associate Professor of Information Systems; Senior Advisor, SIS) and Instructor Meenakshi who provided our team with great insights and guidance throughout this entire project. We would also like to thank our sponsor for graciously providing us with dataset and assistance.