Difference between revisions of "NeighbourhoodWatchDocs Proposal"

From Geospatial Analytics and Applications
Jump to navigation Jump to search
(Edited Data Collection Table)
Line 128: Line 128:
 
8.
 
8.
 
||
 
||
Total Number of Healthcare facilities in Singapore
+
Total Number of Clinics in Singapore
 
||
 
||
This dataset is necessary for us to locate all the neighborhood clinics in Singapore and find out which Subzone they belong to, so as to aid in our proximity analysis with the dwellings.
+
This dataset is necessary for us to locate all the neighborhood clinics in Singapore and find out their respective addresses, so as to aid in our proximity analysis with the dwellings.
  
As this data is not readily available to us, we will need to perform web scraping to "scrape" the data off the Singapore YellowPages website. A Python script can help us achieve the following and more details can be found below.
+
As this data is not readily available to us, we will need to perform web scraping to "scrape" the data off the Singapore Healthcare Institutions Directory (HCI) website. A Python script can help us achieve the following and more details can be found below.
 
||
 
||
https://www.yellowpages.com.sg/
+
http://hcidirectory.sg/hcidirectory/
 +
|-
 +
|
 +
9.
 +
||
 +
Total Number of TCM Clinics in Singapore
 +
||
 +
This dataset is necessary for us to locate all the TCM Medical Centres in Singapore and find out their respective addresses, so as to aid in our proximity analysis with the dwellings.
 +
 
 +
As this data is not readily available to us, we will need to perform web scraping to "scrape" the data off the Singapore Traditional Chinese Medicine Practitioners Board (TCM Board) website. Code tweaks to the Python script used for the HCI website data will help us collect this dataset successfully.
 +
||
 +
https://prs.moh.gov.sg/prs/internet/profSearch/main.action?hpe=TCM
 
|}
 
|}
 
</div>
 
</div>
Line 144: Line 155:
 
|}
 
|}
 
<div>The code snippets above shows how we can perform web scraping using a Python script. The script mainly does the following in a nutshell:
 
<div>The code snippets above shows how we can perform web scraping using a Python script. The script mainly does the following in a nutshell:
1. Visit through each page of the YellowPages Singapore's clinics results in a loop.</br>
+
1. Uses a set of pre-defined parameters and settings to visit through each page of the HCI's search results in a loop.</br>
2. Make use of XPATH expressions to retrieve the content (clinic name, address, latitude and longitude) located at specific HTML attributes.</br>
+
2. Make use of XPATH expressions to retrieve the content (healthcare facility name, address, postal code) located at specific HTML attributes.</br>
3. Check whether the retrieved clinic info is a Dental clinic, if yes we will SKIP the result, if not we will proceed to save it.</br>
+
3. When all pages are visited, the retrieved information will be parsed and stored into a .CSV file with the respective columns.</br>
4. When all pages are visited, the retrieved clinics information will be parsed and stored into a .CSV file with the respective columns.</br>
 
 
</div>
 
</div>
  

Revision as of 00:20, 7 March 2019


HOME

PROPOSAL

POSTER

APPLICATION

RESEARCH PAPER



PROJECT DESCRIPTION
Our project aims to make use of geospatial intelligence to explore the potential of allocating nearby doctors within estates to the residents, in particular the elderly to combat the issues of an ageing population.


PROJECT MOTIVATION
Between 2000 and 2018, Singapore’s population grew from 4.028 million to 5.791 million. However, the number of citizens aged 65 and above is increasing rapidly, as population growth slows. The size of this group of citizens grew by more than 2 times from 220,000 in year 2000 to 547,900 in year 2018, and is expected to increase even more by 2030. As the government look towards new ways in providing better social welfare, we aim to explore the possibility of allocating neighbourhood clinic doctors to nearby HDB blocks to ensure that those with mobility issues or disabilities receives adequate healthcare.


PROJECT OBJECTIVES
Our goals are :
  • To build a GIS tool (an R Shiny app)
  • Analyse the demand and supply of clinics using the proximity of each clinic in residential zones
  • Evaluate results of analysis and provide recommendations to further improve the social welfare for the residents who require special assistance


DATA COLLECTION
To achieve our project objectives, it is necessary for us to obtain the datasets that is available online for use. The following table depicts the list of datasets we require and how we can obtain them:

No.

Dataset

Description

Source(s)

1.

OSM Layer (Singapore)

This dataset is necessary for us to be able to plot the Singapore map.

OpenStreetMap

2.

Singapore Planning Subzone (MP14_SUBZONE_WEB_PL)

This dataset is necessary for us to be able to plot the Singapore map out at a planning subzone level.

https://data.gov.sg/dataset/master-plan-2014-subzone-boundary-no-sea

3.

Estimated Singapore Resident Population in HDB Flats

Find out the number of residents per estate.

https://data.gov.sg/dataset/estimated-resident-population-living-in-hdb-flats

4.

Dwelling Units under HDB's Management, by Town and Flat Type

Shows number of units per estate for each flat type.

https://data.gov.sg/dataset/number-of-residential-units-under-hdb-s-management

5.

Residents by Age Group & Type of Dwelling, Annual

Shows age group and number of residents by type of dwelling.

https://data.gov.sg/dataset/residents-by-age-group-type-of-dwelling-annual

6.

Singapore Residents by Subzone and Type of Dwelling, June 2016

This dataset is necessary for us to find out the total number of elderly population in a specific Subzone.

https://data.gov.sg/dataset/singapore-residents-by-subzone-and-type-of-dwelling-june-2016

7.

Singapore Residents by Subzone, Age Group and Sex, June 2016 (Gender)

This dataset is necessary for us to find out the total number of elderly population.

https://data.gov.sg/dataset/singapore-residents-by-subzone-age-group-and-sex-june-2016-gender

8.

Total Number of Clinics in Singapore

This dataset is necessary for us to locate all the neighborhood clinics in Singapore and find out their respective addresses, so as to aid in our proximity analysis with the dwellings.

As this data is not readily available to us, we will need to perform web scraping to "scrape" the data off the Singapore Healthcare Institutions Directory (HCI) website. A Python script can help us achieve the following and more details can be found below.

http://hcidirectory.sg/hcidirectory/

9.

Total Number of TCM Clinics in Singapore

This dataset is necessary for us to locate all the TCM Medical Centres in Singapore and find out their respective addresses, so as to aid in our proximity analysis with the dwellings.

As this data is not readily available to us, we will need to perform web scraping to "scrape" the data off the Singapore Traditional Chinese Medicine Practitioners Board (TCM Board) website. Code tweaks to the Python script used for the HCI website data will help us collect this dataset successfully.

https://prs.moh.gov.sg/prs/internet/profSearch/main.action?hpe=TCM

Code Snippet 1 Code Snippet 2
NeighbourhoodWatchDocs Scraper1.png
NeighbourhoodWatchDocs Scraper2.png
The code snippets above shows how we can perform web scraping using a Python script. The script mainly does the following in a nutshell:

1. Uses a set of pre-defined parameters and settings to visit through each page of the HCI's search results in a loop.
2. Make use of XPATH expressions to retrieve the content (healthcare facility name, address, postal code) located at specific HTML attributes.
3. When all pages are visited, the retrieved information will be parsed and stored into a .CSV file with the respective columns.


PROJECT TIMELINE
NeighbourhoodWatchDocs Timeline.jpg


STORYBOARD

Our team aims to create a dashboard that will display all the clinics (medical and TCM) across Singapore. We will also display dwellings in the respective subzones and the estimated number of elderly aged 65 and above staying there. Just the above will show the supply and demand for healthcare in clinics. Further analysis will be calculated on how the supply and demand are met, as well as to identify any potential gaps e.g. overdemand or oversupply of healthcare facilities in a particular subzone.

The supply and demand data for selected subzones will then be displayed on a data table for better visualization.

NeighbourhoodWatchDocs storyboard.jpg


TOOLS & TECHNOLOGY

These are the tools and technology our team aims to explore and use through the period of the project to achieve our objectives. It will be updated as we go through the project.

Neighbourhood WatchDocs Technologies.JPG


PROJECT CHALLENGES

No.

Key Technical Challenges

Description

Proposed Solution

Outcome

1.

Unfamiliarity with R packages and R Shiny

Our team may encounter the use of additional R resources that were not taught in class.

- Independent Learning on R packages and R Shiny
- Browsing the official RDocumentation website for support and reference
- Research for online tutorials that have a specific use case for certain R packages

We managed to solve the mentioned challenge with the following resources:
-

2.

Data Cleaning and Transformation

As we need to collect the data from various sources, they may have different attributes such as the Coordinate Reference System (CRS), units of measurement and etc.

Adopt a standardized process of cleaning the data, focusing with what we only need. Most of the datasets used for our project can be found in our Hands-On or Take-Home exercises and we can rely on those existing data.

We managed to solve our technical challenge with the following:
-

3.

Limitations & Constraints in Datasets

There are certain assumptions that we need to make based on the context and purpose of our project, such as the average number of doctors in a particular clinic, which cannot be derived from our datasets.

Working out with the team together and figuring out a reasonable and valid assumption, together with adequate online research and consultation with Prof. Kam.

We managed to solve our technical challenge with the following:
-