Group 10: Proposal
PROJECT DETAILS | PROJECT POSTER | PROJECT APPLICATION | RESEARCH PAPER |
Contents
Project Description
Problem Statement
"Should I set up my business here? Or should i set up there?" is the number 1 dilemma that startups face. The success of the new business is strongly correlated to how competitive the area is and towards the population density of the area.
Motivation
Consider a business owner who wish to setup a new B2C shop in Singapore. The nature of B2C business are that their success are strongly correlated to where they place their location. If a business owner were to setup a new KTV outlet in Bedok central, they will have a higher profit compared to setting up shop in a less densely populated area such as Tuas. Placing your business closer to a more densely populated area will allow nearby residents to have an easier time to have access to your business. New business owner should also be aware of competitors as the amount and quality of your competitors will directly affect your sales and profit. However there are times whereby the population is so large such that the current competitors are unable to satisfy the needs of the nearby residents. In this case, setting up your business next to your competitor will render little harm to your profit.
Solution
There are many other considerations to where business owners should setup their business. However our team have identified the level of competition, population as well as the transport network in the area as the main concerns. These data are not easily available. They have to be sourced, cleaned and mashed together in order for business owners to have a clear picture.
The project aims to assist entrepreneurs in their planning, when looking for a suitable location to set up their businesses. The interactive web-based geospatial application will help new business owner to analyze the viability of setting up their business at the specified location by providing them with considering factors such as the level of competition, transport network connectivity as well as the population density in the area.
Data Sources
1. inSing Business Directory
2. Data.gov
3. Transport Network
Targeted Business Categories
Our project aims to be dynamic and light-weight. We have to accommodate as wide-range of business owners as much as we can without burdening our application with tons of data to store and process. In order to make this idea a reality, our project made use of InSing.com for our source of data. We allow our users to select any categories of business that is available in InSing.com
500px
The data below show the number of businesses that are registered under the InSing categories.
Arts and Entertainment
Arts and Entertainment
- karaoke (303)
- cinemas (55)
Food and Drinks
- restaurants (11522)
- coffeeshop (1193)
- cafes (7192)
Groceries
- minimarts and provisionshops (2209)
Beauty and Spa
- hair salons and barbers (3519)
Education
- tuition agency and schools (683)
Sports and Recreational
- gyms (76)
Dental Services
- dentists and dental clinics (505)
Web Application Walkthrough
Choose Business Category
Users can choose from the available list of business categories that they wish to setup. E.G If they wish to setup a type of cafe, they can browse through the list and select 'cafe'.
400px
Users can choose up to 3 different categories to make comparison between choices.
Web Crawling
Once the user have selected a business category, our application will crawl InSing.com for all the business that are registered under the selected category. For the user's point of view, the process is only a matter of seconds
They will see the web is loading data..
500px
Data are retrieved and populating on the map dynamically.
500px
Marker Clusters
Once the data is loaded, users will be able to see the location of all the businesses of the selected categories. We made use of marker clusters to prevent too many markers from overpopulating a specific point on the map. Selecting a marker will provide more details on the business in that location
500px
Area Analysis
If the user is interested in setting up a business in Marine Parade, the user will have to drag any of the 3 markers and place them at their preferred position on the map.
100px
800px
A buffer of 1KM radius will be created around the placed marker. All the businesses and population that falls within the buffer will be considered in our descriptive analysis. On the Results panel on the right of the picture, it will show the user
1. Available Taxi Stands, Bus Stops and Train Stations (within buffer)
2. No. of Similar Business (within buffer)
3. Population Breakdown by Age (within buffer)
These information can help the users decide how well connected the area is, how populated and the level of competition in the area.
Users can modify the buffer radius to cover based on their needs. The default radius is 1KM. They can increase or decrease the radius by the slider
600px
For example, if the user wants to setup a cafe business, they will focus more on the age group of Young Adults and Teenagers as they are their target audience. They will also need to understand the level of competition, which is displayed on the reports as well.
500px
Transport Data
The transport data are derived and processed from MyTransport API
we retrieved their bus stops, bus interchanges and Mrt Stations data. We geocode them using OneMaps API and plot them on the map.
500px
Population Data
The population data was retrieved from Singapore Census data 2014. With the population breakdown by subzone and age, we further categorised the age into 4 discrete bins, Teenagers (below age 19), Young Adults (Age 20 – 39), Adults (Age 40 – 59) and Elderly (Age 60 and above) for better distinction.
Other than population data, we also needed the locations of the residential buildings, so as to map the population to the buildings. The location of the buildings was also retrieved from the Singapore OpenStreetMap. During the data cleaning process, 61.5 % of the records had both columns as ‘NULL’, hence we included those points as possible residential buildings as we were unable to conclude that those are non-residential buildings.
With both buildings and population data by subzone, we compare both data, to ensure that subzone with no population, should not have any buildings. We then removed the building points from the subzone where there is not any residents.
Based on the number of buildings and population in each subzone, we obtained the average population of each individual building by taking the total population, divided by the number of buildings, to get a rough estimate of the average population per building. The same procedure was subsequently performed, by dividing the population of the different age categories with the number of buildings in each subzone, to get the average population of teenagers, young adults, adults and elderly for each building.
Population Data Heatmap
Based on the population data, we are able to produce a heatmap based on the population density. This will help recommend business owners to setup their business closer to the more populated areas.
300px
Area Analysis Comparisons
Our web application provides comparison tool to allow users to select up to 3 locations to make comparisons between different locations
800px
Technical Challenges
Dynamic Web Crawling
Our team made use of 3rd party APIs to facilitate the crawling process of our application. We made use of import.io.
400px
In order to crawl accurate amount of data, there are 2 parts to the crawling process.
1. Determine the number of data(businesses) rows.
2. Crawl the web based on the number of data rows.
We have to determine the number of data rows to crawl because we are pulling data from InSing and they limit the display of data only for 10 rows at a time. Therefore by determining the exact number data rows that InSing have, we can determine the number pages that we have to crawl.
After we retrieved all the business data, we have to geocode them as they only provide their business registered name and address. We made use of OneMap API for their geocoding services as they able to geocode into SVY21 format. They do not have a limit restriction as well.
The entire crawling process is asyncrhonous. We make use of XMLHttpRequest to make async request to the APIs. Therefore all the markers are populating dynamically, non-blocking. Users will be able to continue carrying out their tasks. The data that are crawled from InSing are cache temporary on the client side. We do not store the data permanently. New data will be retrieved on every crawl, making it lightweight and accurate.
Allowing users to pin a location on the map and perform analysis.
Our application require users to place a pin on the map which will generate a buffer of 1Km radius. Our application takes in all the population, transport and business data that falls within the buffer and perform analysis. The difficult portion is to determine E.G Which bus stop or which building is within the radius and how many in total are there. To resolve this issue, we made use of Turf.js to help us with managing all the features that are within the buffer radius.
Project Timeline/Milestones
Week 10
i. Data collection
ii. Visualise the basic layers such as transportation and amenities
Week 11
i. Extract/Crawl data from the sites
ii. Perform cleaning of those data
iii. Perform Geocoding
Week 12
i. Visualise layers of the geocoded locations
ii. Implement the feature for selection of the type of business
Week 13
i. Implement the feature to pin a location on the map
ii. Assigning weights to the variables/factors
iii. Calculation of viability score
Week 14
i. Visualise the hotspots for setting up the businesses
ii. Analysis and recommendations to users
Week 15
i. Preparation of project poster and presentation
ii. Start on project report
Week 16
i. Submission of project report, poster
ii. Townhall poster presentation
Future Works
Our team could have continued by utilizing various models such as regression model to provide a stronger analysis to all the users. We can have put in more recommendations for the users. For example, instead of having user the select their location, the system will recommend them the best place to setup a 'cafe' business for example. This will take into account of available land or shops for leasing. We could have catered for other factors as we need to understand that different business have different factors to consider. B2C businesses may depend more heavily on location rather than online businesses.