Difference between revisions of "Group 10: Proposal"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
(Created page with "<!-- navigation bar start --> {| style="background-color:#FFFFFF; color:#000000 padding: 5px 0 0 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" | |st...")
 
(Blanked the page)
 
Line 1: Line 1:
<!-- navigation bar start -->
 
{| style="background-color:#FFFFFF; color:#000000 padding: 5px 0 0 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" |
 
|style="font-size:100%; text-align:center; border-left:1px solid #ffffff; border-right:1px solid #ffffff;background-color:#203470; padding:12px;" width="25%" |[[Ten Project Details | <font color="#FFF"><b>PROJECT DETAILS</b></font>]]
 
|style="font-size:100%; text-align:center; border-left:1px solid #ffffff; border-right:1px solid #ffffff;background-color:#5478E4; " width="25%" |[[Ten Project Poster |<font color="#FFFFFF"><b>PROJECT POSTER</b></font>]]
 
|style="font-size:100%; text-align:center;border-left:1px solid #ffffff; border-right:1px solid #ffffff; background-color:#5478E4; " width="25%" |[[Ten Project Application |<font color="#ffffff"><b>PROJECT APPLICATION</b></font>]]
 
|style="font-size:100%; text-align:center;border-left:1px solid #ffffff; border-right:1px solid #ffffff; background-color:#5478E4; " width="25%" |[[Ten Research Paper |<font color="#ffffff"><b>RESEARCH PAPER</b></font>]]
 
|}
 
<!-- navigation bar end -->
 
  
<!-- motivation -->
 
 
=Project Description=
 
 
<div style="background: #FFFFFF; margin:25px; text-indent: 0px; font-size:14px; font-family:helvetica">
 
<font>
 
 
==Problem Statement==
 
"Should I set up my business here? Or should i set up there?" is the number 1 dilemma that startups face. The success of the new business is strongly correlated to how competitive the area is and towards the population density of the area. <br><br>
 
 
==Motivation==
 
Consider a business owner who wish to setup a new B2C shop in Singapore. The nature of B2C business are that their success are strongly correlated to where they place their location. If a business owner were to setup a new KTV outlet in Bedok central, they will have a higher profit compared to setting up shop in a less densely populated area such as Tuas. Placing your business closer to a more densely populated area will allow nearby residents to have an easier time to have access to your business. New business owner should also be aware of competitors as the amount and quality of your competitors will directly affect your sales and profit. However there are times whereby the population is so large such that the current competitors are unable to satisfy the needs of the nearby residents. In this case, setting up your business next to your competitor will render little harm to your profit.
 
 
==Solution==
 
There are many other considerations to where business owners should setup their business. However our team have identified the level of competition, population as well as the transport network in the area as the main concerns. These data are not easily available. They have to be sourced, cleaned and mashed together in order for business owners to have a clear picture.
 
 
The project aims to assist entrepreneurs in their planning, when looking for a suitable location to set up their businesses. The interactive web-based geospatial application will help new business owner to analyze the viability of setting up their business at the specified location by providing them with considering factors such as the level of competition, transport network connectivity as well as the population density in the area.
 
</font></div>
 
 
<!-- end of motivation -->
 
 
<!-- related works-->
 
 
=Data Sources=
 
 
1. [//search.insing.com/ inSing Business Directory]<br>
 
2. [//ref.data.gov.sg/ Data.gov]<br>
 
3. [//mytransport.sg/content/mytransport/home/dataMall.html Transport Network]<br>
 
 
<!-- end of related works-->
 
 
<!-- approach -->
 
 
=Targeted Business Categories=
 
Our project aims to be dynamic and light-weight. We have to accommodate as wide-range of business owners as much as we can without burdening our application with tons of data to store and process. In order to make this idea a reality, our project made use of InSing.com for our source of data. We allow our users to select any categories of business that is available in InSing.com<br/><br/>
 
[[File:Categories.png|500px]]<br/><br/>
 
The data below show the number of businesses that are registered under the InSing categories.
 
==Arts and Entertainment==
 
Arts and Entertainment
 
- karaoke (303)<br>
 
- cinemas (55)<br><br>
 
 
==Food and Drinks==
 
- restaurants (11522)<br>
 
- coffeeshop (1193)<br>
 
- cafes (7192)<br><br>
 
 
==Groceries==
 
- minimarts and provisionshops (2209)<br><br>
 
 
==Beauty and Spa==
 
- hair salons and barbers (3519)<br><br>
 
 
==Education==
 
- tuition agency and schools (683)<br><br>
 
 
==Sports and Recreational==
 
- gyms (76)<br><br>
 
 
==Dental Services==
 
- dentists and dental clinics (505)<br><br>
 
 
=Web Application Walkthrough=
 
==Choose Business Category==
 
Users can choose from the available list of business categories that they wish to setup. E.G If they wish to setup a type of cafe, they can browse through the list and select 'cafe'.<br/><br/>
 
[[File:WebCat.png|400px]]<br/><br/>
 
Users can <strong>choose up to 3 different categories</strong> to make comparison between choices.
 
<br/>
 
==Web Crawling==
 
Once the user have selected a business category, our application will crawl <strong>InSing.com</strong> for all the business that are registered under the selected category. For the user's point of view, the process is only a <strong>matter of seconds</strong><br/><br/>
 
They will see the web is loading data..<br/>
 
[[File:Crawl_1.png|500px]]<br/><br/>
 
Data are retrieved and populating on the map dynamically.<br/>
 
[[File:Crawl_2.png|500px]]<br/><br/>
 
 
<br/><br/>
 
==Marker Clusters==
 
Once the data is loaded, users will be able to see the location of all the businesses of the selected categories. We made use of <strong>marker clusters</strong> to prevent too many markers from overpopulating a specific point on the map. Selecting a marker will provide more details on the business in that location<br/><br/>
 
[[File:Clusters.png|500px]]<br/><br/>
 
 
==Area Analysis==
 
If the user is interested in setting up a business in <strong>Marine Parade</strong>, the user will have to drag any of the 3 markers and place them at their preferred position on the map.<br/><br/>
 
[[File:Mark.png|100px]]<br/><br/>
 
[[File:Area.png|800px]]<br/><br/>
 
A buffer of 1KM radius will be created around the placed marker. All the businesses and population that <strong>falls within the buffer</strong> will be considered in our descriptive analysis. On the <strong>Results</strong> panel on the right of the picture, it will show the user<br/><br/>
 
1. Available Taxi Stands, Bus Stops and Train Stations (within buffer)<br/>
 
2. No. of Similar Business (within buffer)<br/>
 
3. Population Breakdown by Age (within buffer)<br/><br/>
 
 
These information can help the users decide how well connected the area is, how populated and the level of competition in the area.
 
<br/><br/>
 
Users can <strong>modify the buffer radius</strong> to cover based on their needs. The default radius is 1KM. They can increase or decrease the radius by the slider<br/>
 
[[File:Radius.png|600px]]<br/><br/>
 
 
For example, if the user wants to setup a cafe business, they will focus more on the age group of Young Adults and Teenagers as they are their target audience. They will also need to understand the level of competition, which is displayed on the reports as well.<br/>
 
[[File:Reporting.png|500px]]<br/><br/>
 
 
===Transport Data===
 
The transport data are derived and processed from <strong>MyTransport API</strong><br/>
 
we retrieved their bus stops, bus interchanges and Mrt Stations data. We <strong>geocode</strong> them using OneMaps API and plot them on the map.<br/><br/>
 
[[File:Transport.png|500px]]<br/><br/>
 
 
 
===Population Data===
 
The population data was retrieved from Singapore Census data 2014. With the population breakdown by subzone and age, we further categorised the age into 4 discrete bins, Teenagers (below age 19), Young Adults (Age 20 – 39), Adults (Age 40 – 59) and Elderly (Age 60 and above) for better distinction.
 
Other than population data, we also needed the locations of the residential buildings, so as to map the population to the buildings. The location of the buildings was also retrieved from the Singapore OpenStreetMap. During the data cleaning process, 61.5 % of the records had both columns as ‘NULL’, hence we included those points as possible residential buildings as we were unable to conclude that those are non-residential buildings.<br/><br/>
 
 
With both buildings and population data by subzone, we compare both data, to ensure that subzone with no population, should not have any buildings. We then removed the building points from the subzone where there is not any residents.
 
Based on the number of buildings and population in each subzone, we obtained the average population of each individual building by taking the total population, divided by the number of buildings, to get a rough estimate of the average population per building. The same procedure was subsequently performed, by dividing the population of the different age categories with the number of buildings in each subzone, to get the average population of teenagers, young adults, adults and elderly for each building.<br/>
 
===Population Data Heatmap===
 
Based on the population data, we are able to produce a heatmap based on the <strong>population density</strong>. This will help recommend business owners to setup their business closer to the more populated areas.
 
<br/><br/>
 
[[File:Heat.png|300px]]<br/><br/>
 
 
==Area Analysis Comparisons==
 
Our web application provides comparison tool to allow users to <strong>select up to 3 locations</strong> to make <strong>comparisons</strong> between different locations<br/><br/>
 
[[File:Reporting 2.png|800px]]<br/><br/>
 
 
=Technical Challenges=
 
==Dynamic Web Crawling==
 
Our team made use of 3rd party APIs to facilitate the crawling process of our application. We made use of import.io.<br/><br/>
 
[[File:Import.png|400px]]<br/><br/>
 
In order to crawl accurate amount of data, there are 2 parts to the crawling process. <br/><br/>
 
1. Determine the number of data(businesses) rows.<br/>
 
2. Crawl the web based on the number of data rows.<br/><br/>
 
 
We have to determine the number of data rows to crawl because we are pulling data from InSing and they limit the display of data only for 10 rows at a time. Therefore by determining the exact number data rows that InSing have, we can determine the number pages that we have to crawl.
 
 
After we retrieved all the business data, we have to <strong>geocode</strong> them as they only provide their business registered name and address. We made use of <strong>OneMap API</strong> for their geocoding services as they able to geocode into SVY21 format. They do not have a limit restriction as well.
 
 
The entire crawling process is asyncrhonous. We make use of XMLHttpRequest to make async request to the APIs. Therefore all the markers are populating dynamically, non-blocking. Users will be able to continue carrying out their tasks. The data that are crawled from InSing are <strong>cache temporary</strong> on the client side. We do not store the data permanently. New data will be retrieved on every crawl, making it lightweight and accurate.
 
<br/><br/>
 
==Allowing users to pin a location on the map and perform analysis.==
 
Our application require users to place a pin on the map which will generate a buffer of 1Km radius. Our application takes in all the population, transport and business data that falls within the buffer and perform analysis. The difficult portion is to determine E.G Which bus stop or which building is within the radius and how many in total are there. To resolve this issue, we made use of Turf.js to help us with managing all the features that are within the buffer radius.
 
<br/><br/>
 
=Project Timeline/Milestones=
 
 
<div style="background: #FFFFFF; margin:25px; text-indent: 0px; font-size:14px; font-family:helvetica">
 
<font>
 
<b>Week 10</b><br>
 
i. Data collection <br>
 
ii. Visualise the basic layers such as transportation and amenities <br><br>
 
 
<b>Week 11</b><br>
 
i. Extract/Crawl data from the sites<br>
 
ii. Perform cleaning of those data<br>
 
iii. Perform Geocoding<br><br>
 
 
<b>Week 12</b><br>
 
i. Visualise layers of the geocoded locations<br>
 
ii. Implement the feature for selection of the type of business<br><br>
 
 
<b>Week 13</b><br>
 
i. Implement the feature to pin a location on the map<br>
 
ii. Assigning weights to the variables/factors<br>
 
iii. Calculation of viability score <br><br>
 
 
<b>Week 14</b><br>
 
i. Visualise the hotspots for setting up the businesses<br>
 
ii. Analysis and recommendations to users<br><br>
 
 
<b>Week 15</b><br>
 
i. Preparation of project poster and presentation<br>
 
ii. Start on project report<br><br>
 
 
<b>Week 16</b><br>
 
i. Submission of project report, poster<br>
 
ii. Townhall poster presentation
 
 
</font></div>
 
<!-- end of timeline-->
 
 
=Future Works=
 
Our team could have continued by utilizing various models such as regression model to provide a stronger analysis to all the users. We can have put in more recommendations for the users. For example, instead of having user the select their location, the system will recommend them the best place to setup a 'cafe' business for example. This will take into account of available land or shops for leasing. We could have catered for other factors as we need to understand that different business have different factors to consider. B2C businesses may depend more heavily on location rather than online businesses.
 
<div style="background: #6CACFF; padding: 15px; margin:25px; font-weight: bold; line-height: 0.3em; text-indent: 0px; font-size:20px; font-family:helvetica"><font color= #FFFFFF>Comments</font></div>
 

Latest revision as of 15:14, 6 October 2016