Difference between revisions of "REMGIS Proposal"

From IS415-Geospatial Analytics for Business Intelligence
Jump to navigation Jump to search
Line 37: Line 37:
  
 
==Data Preprocessing==
 
==Data Preprocessing==
===Data source===  
+
===Data Source===  
 
{| class="wikitable" style="background-color:white;" style="text-align:left;" style="vertical-align: top;"
 
{| class="wikitable" style="background-color:white;" style="text-align:left;" style="vertical-align: top;"
 
|----
 
|----

Revision as of 06:32, 13 April 2018


REMGIS Logo.png


HOME

PROPOSAL

POSTER

APPLICATION

RESEARCH PAPER


Project Motivation

Under the Urban Redevelopment Authority (URA) masterplan that was first released in 2008, URA has been making continuous efforts ever since to grow new employment centres outside the current Central Business District (CBD) in Singapore. This initiative has led to the development of Singapore’s second CBD in the areas of Jurong Lake, together with Jurong East. These joint areas known as the Jurong Lake District, aim to replicate the same amount of success, synergy and vibrancy of the first CBD, or even better. For this to be done successfully, our group aims to conduct thorough analysis and evaluations of the factors that made Singapore’s first CBD the success that it is today.



Project Objectives

This project aims to achieve the following objectives:

1. Analyze the factors that make Singapore’s CBD successful

 a. Identify and study the make-up of professional services within Singapore’s CBD
 b. Examine the landmarks that contribute to the CBD’s success
 c. Understand the factors that influence business locations

2. Replicate and evaluate Singapore’s CBD success factors to the upcoming second business district - Jurong Lake District

Data Preprocessing

Data Source

S/N Description Source
1 Legal http://www.yellowpages.com.sg/search/all/Legal
2 Bank http://www.yellowpages.com.sg/category/banks
3 Accountancy http://www.yellowpages.com.sg/search/all/Accountancy
4 Architectural http://www.yellowpages.com.sg/search/all/Architectural
5 Management Consultancy http://www.yellowpages.com.sg/search/all/Management+Consultancy


Data Collection

To retrieve the records of all companies in these industries, we performed data scraping on all the companies in the above-mentioned sectors found in the Yellow Pages’ website using R. Once all the companies’ records were retrieved successfully, they were written and saved to a CSV file.

Data Cleaning

One of the problems with the data was that there were some companies that had incorrect postal codes. The incorrect postal codes followed two particular patterns – the first is that the postal code only had 5 digits while the second is that the postal code had a starting postal district which did not correspond to that of Singapore’s. These records had to be manually found by looking through each company’s address and finding postal codes that exhibited either one of the two patterns or both.

After addressing the postal code issue, we realized that for the data to be usable in R, we needed to retrieve the latitude and longitude coordinates of each company. This was done by using the "geocode" function from the "ggmap" package.

Here is a snippet of what the finalized dataset looks like:

Step1BeforeGeoCode.png


Related Works

London’s Central Business District

Rw1.png
Rw2.png

Goal of study: Examine the range of London’s CBD activities, dynamism and competition of the CBD, the agglomeration of the various types of industries as well as the CBD’s future opportunities.

Kernel density estimation was used to measure the number of professional services’ jobs density per square kilometer. Although we were more interested in the makeup of firms in the CBD and not the number of jobs or employment, we decided that they could build upon this idea and apply kernel density estimation analysis on the various types of firms in Singapore’s CBD instead.

Guangzhou and Shenzhen

Rw3.jpg

Goal of study: Analyze the cartographic definition and representation of the CBD by studying its urban development and functions.

A concentration index is presented to visualize the urban environment by using a density surface that is refined with network distances instead of Euclidean ones. At the end of the study, one of the conclusions of the paper stated that area-based methods of urban analysis such as kernel density estimation are widely used for the purposes of generating a continuous surface with density attribute. This was a positive indication to us to carry on with the initial idea of performing kernel density estimation analysis to analyze the makeup of firms in the CBD and JLD.

Methodology

Location Quotient Analysis

Lqform.png

One of the ways to measure success was to determine if the region’s local needs are being met. As a result, we decided to analyse the location quotient values of the five sectors within the CBD and JLD areas, then form our conclusions.

Kernel Density Estimation

Kde.png

To find out about the density of each sector in the CBD, we decided to apply kernel density estimation. This would help to show the density of the various sectors as well as examine which areas of the CBD are more and less dense.

Quadrat Analysis

Qc.png

To determine if firms in the CBD are evenly spaced or clustered, we decided to use a quadrat analysis. We wanted to conduct a test of Complete Spatial Randomness for the 5 industry point patterns, based on quadrat counts before by using the quadrat.test() function of the “spatstat” package.

For the analysis, we will be performing a Monte Carlo simulation with 1000 trails instead of chi-squared tests, as this would obviate the need for all quadrat expected counts to be at least 5 as we noticed in the pre-run that the spatial points of all firms may range from 0 to x number in the CBD window.

Web Application Design

Inspiration

InTemplate.png

We gathered inspiration for the application’s UI by looking through and using multiple projects that were done by past groups. We took and combined elements that we liked then came up a storyboard for our own application.

Storyboard

Sb1.jpg
Sb2.jpg
Sb3.jpg

Application Architecture

App archi.jpg

Architecture Overview


REMGIS App Layers.jpg

Application Layers



Application Overview

Map Tab

Features

Choose csv upload.PNG
  • Allows user to upload their own firms data
  • The format of the firms data must be in a csv file and in the same format as the data collected from our scrapping script, template of data format can be downloaded for the future.
Analysis selection.png
  • Allows user to select the analysis to perform
Sector selection.png
  • Allows user to select the respective sector on the map
Kde dist slider.PNG
  • The bandwidth of the Kernel Density plot.
  • Smoother plot will be seen with a larger kernel distance while a smaller kernel distance will lead to a plot with more noise.

Spatial Point Analysis

Overview1.PNG

KDE Analysis

Overview1.1.PNG

Kernel Density Map Comparison Tab

Overview2.PNG

Location Quotient Tab

Overview3.PNG

Features

Sector checkbox.PNG
  • Allows user to toggle the respective sector on the map
LQAnalysis.PNG
  • Shows LQ figures for both CBD and JLD
  • Shows Number of firms in respective regions
Lqcbd.PNG
  • Overview of CBD LQ in Bar Chart


Lqjld.PNG
  • Overview of JLD LQ in Bar Chart

Quadrat Analysis Tab

Features

Sector checkbox.PNG
  • Allows user to toggle the respective sector on the map
Rc slider.PNG
  • Select respective rows and columns for the quadrat map

All Professional Firms

Overview4.PNG

Selected Firms

Overview5.PNG

Project Timeline

Timeline REMGIS.jpg

Project Challenges

S/N Challenges Solutions
1 Data Scraping API is not readily accessible from Yellow Pages
  • Explore web scraping techniques in Python and R
  • Lookup Stack Overflow's website for existing samples to test
  • Perform additional data cleaning accordingly for records that are inaccurate, incomplete or inconsistent
2 Geocoding from postal codes to X and Y coordinates to perform thematic mapping
  • Study external geocoding APIs and adopt one
  • Write a test application to geocode postal codes using the selected API
  • Apply knowledge to project
3 Inexperienced with R Shiny package and R programming
  • Study R Shiny package and how to use it by going through its documentation and tutorials
  • Self-directed learning on DataCamp​
  • Refer to past year projects for UI inspiration​
  • Meet Prof Kam for consultation​
4 Unfamiliar with the implementation of spatial analysis methods such as:
  • Location Quotient
  • K-Means
  • Kernel Density Estimation
  • Conduct research on R packages which provide the functions that are mentioned
  • Study how similar projects make use of the functions and packages​
  • Go through documentation and tutorials​
  • Write test applications to familiarize with the packages​
  • Consult Prof Kam on major roadblocks​


Project Tools & Technologies

REMGIS Tools.jpg


RemComments.PNG
Your comments will be helpful for us to improve our project.

No.

Name

Date

Comments

1.

Insert your Name here

Insert Date here

Insert Comment here

2.

Insert your Name here

Insert Date here

Insert Comment here

3.

Insert your Name here

Insert Date here

Insert Comment here

RReferences.PNG