Difference between revisions of "Two Eyes One Pizza Data"

From Geospatial Analytics for Urban Planning
Jump to navigation Jump to search
(Created page with "300px|frameless|center <span class="mw-ui-button {{#switch: {{{color|white}}} | red = mw-ui-destructive | green = mw-ui-constructive | white = | blue = mw-ui...")
 
 
(36 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[File:2e1p.png|300px|frameless|center]]
+
[[File:3hg.png|300px|frameless|center]]
 
<span class="mw-ui-button {{#switch: {{{color|white}}} | red = mw-ui-destructive | green = mw-ui-constructive | white = | blue = mw-ui-progressive}}">[[Project_Groups|{{{Clickable Button|Back to Project Home}}}]]</span><noinclude>
 
<span class="mw-ui-button {{#switch: {{{color|white}}} | red = mw-ui-destructive | green = mw-ui-constructive | white = | blue = mw-ui-progressive}}">[[Project_Groups|{{{Clickable Button|Back to Project Home}}}]]</span><noinclude>
 
<!--Header-->
 
<!--Header-->
Line 5: Line 5:
  
 
| style="background:none;" width="1%" | &nbsp;
 
| style="background:none;" width="1%" | &nbsp;
| style="padding:0.2em; font-size:100%; background-color:#8b1209;  border-bottom:0px solid #3D9DD7; text-align:center; color:#F5F5F5" width="10%" |  
+
| style="padding:0.2em; font-size:100%; background-color:#1D1D1D;  border-bottom:0px solid #3D9DD7; text-align:center; color:#F5F5F5" width="10%" |  
 
[[G1-Group10|<font color="#F5F5F5" size=2 face="Helvetica"><b>PROPOSAL</b></font>]]
 
[[G1-Group10|<font color="#F5F5F5" size=2 face="Helvetica"><b>PROPOSAL</b></font>]]
  
 
| style="background:none;" width="1%" | &nbsp;
 
| style="background:none;" width="1%" | &nbsp;
| style="padding:0.2em; font-size:100%; background-color:#1D1D1D;  border-bottom:0px solid #3D9DD7; text-align:center; color:#F5F5F5" width="10%" |  
+
| style="padding:0.2em; font-size:100%; background-color:#8b1209;  border-bottom:0px solid #3D9DD7; text-align:center; color:#F5F5F5" width="10%" |  
 
[[Two_Eyes_One_Pizza_Data|<font color="#F5F5F5" size=2 face="Helvetica"><b>DATA TRANSFORMATION</b></font>]]
 
[[Two_Eyes_One_Pizza_Data|<font color="#F5F5F5" size=2 face="Helvetica"><b>DATA TRANSFORMATION</b></font>]]
  
Line 29: Line 29:
 
<br/>
 
<br/>
  
==<div style="background: #8b1209; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #FFFFFF>Introduction</font></div>==
+
==<div style="background: #8b1209; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #FFFFFF>Preliminary Data Observations</font></div>==
<div style="font-family:Helvetica;font-size:16px">
 
International Food Chain (IFC) is a leading brand in its sector, with over 18000 outlets worldwide and an ever-growing presence in the global market. In Taiwan alone, IFC has over 240 branches and are constantly expanding.
 
 
 
However, as the franchise grows bigger, so does its challenges. One of the challenges involves the lack of an analysis to efficiently compare the performance of each chain to one another.
 
 
 
Leveraging on this fact, our group aims to digitalise the data and conduct in-depth analysis on each branch. We hope to track the performance of each chain in relation to Point-Of-Interests surrounding each chain, uncovering and comprehending phenomena, with the aid of spatial data.
 
 
 
==<div style="background: #8b1209; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #FFFFFF>Problem and Motivation</font></div>==
 
<div style="font-family:Helvetica;font-size:16px">
 
To provide an analysis that allows for:
 
 
 
* Digitizing of each chain’s trade and delivery area
 
* Business profiling of the company’s outlet to determine Points-Of-Interests that can generate insights such as: Highest earning outlets, relative performance of outlets, outlet’s profile patterns and item sales information.
 
* Allow for informed business decisions, such as determining locations for new outlet openings with matching POIs of high sales outlets
 
* Scalable program to incorporate future data to generate current information (Using data from other cities besides Taiwan)
 
* Easy and intuitive tool to quickly view information with regards to all branches
 
 
 
==<div style="background: #8b1209; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #FFFFFF>Objectives</font></div>==
 
 
 
 
<div style="font-family:Helvetica;font-size:16px">
 
<div style="font-family:Helvetica;font-size:16px">
This project aims to provide insights into the following:
 
 
# Outlets with the highest monthly sales
 
# Relative monthly or yearly performance of each outlet
 
# Each branch geographical information, including the type and number of POI’s in the surroundings
 
# Profiling of similar types of branches
 
# Improvements for poor performing outlets
 
# Scalable data processing scripts to incorporate future data
 
 
==<div style="background: #8b1209; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #FFFFFF>Background Survey of Related Works (WIP)</font></div>==
 
<div style="font-family:Helvetica;font-size:11px">
 
{| class="wikitable" style="background-color:#ffffff;" width="100%"
 
|-
 
! style="font-weight: bold;background: #000000;color:#fbfcfd;width: 10%;" | Visualizations
 
! style="font-weight: bold;background: #000000;color:#fbfcfd;width: 10%;" | Explaination
 
|-
 
| [[Image:11.png|300px]]
 
<br>
 
||
 
Data source: https://www.researchgate.net/publication/324949619_Visualization_of_Fast_Food_Restaurant_Location_using_Geographical_Information_System<br>
 
 
The visualization provides the buffer polygons, as well as representing population density of the area through colour. By comparing the two, we can conclude whether the center of activity is proportional to the population density in a region. It allows us to perform further exploration to see what spatial information significantly affects the level of activity in a city, such as the availability of points-of-interest. This visualization is great as it allows the viewer to clearly see multiple dimensions dealing with spatial data in an elegant way.
 
  
|-
+
These are the datasets we are using for data transformation:
| [[Image:12.png|300px]]
 
<br>
 
||
 
Data source: https://www.researchgate.net/publication/324949619_Visualization_of_Fast_Food_Restaurant_Location_using_Geographical_Information_System<br>
 
The graph on the left shows the distribution of outlets on the geographical map. The right graph describes the outlets grid distribution, result from grid creation and spatial joint operation. From both figures, they can show the potential tendency of whether the outlets are clustered, and with the number of outlets in each grid. We could use them together to justify and adjust the outlet locations.
 
|-
 
| [[Image:13.png|300px|]]
 
<br>
 
||
 
Data source: https://www.researchgate.net/publication/324949619_Visualization_of_Fast_Food_Restaurant_Location_using_Geographical_Information_System<br>
 
This visualization provides a novel way of linking a variable to its geographical location when hovering over the respective area. It would be great in our case, if we were to allow the user to view the corresponding branch through the tooltip, for example profit.
 
|-
 
| [[Image:14.png|300px]]
 
<br>
 
||
 
Data source: https://www.researchgate.net/publication/324949619_Visualization_of_Fast_Food_Restaurant_Location_using_Geographical_Information_System<br>
 
This shows kernel density surface, based on the number of fast food restaurants around Jakarta and distribute them smoothly, so it provides average surface estimation. Kernel density estimation allows us to observe both the centrality and agglomeration of existing outlets. This visualization allows us to view multiple dimensions at a time in an effective manner, through the choice of colour and size.
 
|-
 
| [[Image:15.png|300px|]]
 
<br>
 
||
 
Data source:
 
https://public.tableau.com/profile/mirandali#!/vizhome/Salesforce-SalesPerformance/SalesPerformance<br>
 
 
This databoard shows the cumulative sales. We could learn from this and display by outlets to compare the performance by having multiple forms of visualization. We really like the fact that certain key summarizations and variables are displayed on the top, and will consider using this in our project.
 
|} </div>
 
 
 
==<div style="background: #8b1209; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #FFFFFF>Tools and Libraries</font></div>==
 
<div style="font-family:Helvetica;font-size:16px">
 
The following tools and libraries are used in the digitisation and analysis:
 
*QGIS
 
*Python
 
*R
 
*Tableau
 
 
 
</div>
 
 
 
==<div style="background: #8b1209; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #FFFFFF>Datasets</font></div>==
 
<p>
 
These are the datasets we plan to use:
 
 
</p>
 
</p>
 
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
 
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
Line 127: Line 46:
 
* A dataset containing the geographical information of each individual branch.
 
* A dataset containing the geographical information of each individual branch.
 
* Used as the main target of our project
 
* Used as the main target of our project
 +
*Branches (FID) that we are looking into: 5, 162, 195, 7, 57, 25, 144, 129, 54, 81, 69, 132, 53, 122
 
|-
 
|-
| <center> Point of Interests , Taiwan </center> ||  
+
| <center> Point of Interests '''(POI)''' , Taiwan </center> ||  
 
* A dataset containing each individual Point-Of-Interests in Taiwan (e.g. ATMs, Amusement Parks, Banks)
 
* A dataset containing each individual Point-Of-Interests in Taiwan (e.g. ATMs, Amusement Parks, Banks)
 +
* Used as features for analysis with regards to each branch
 +
|-
 +
| <center> Powerpoint Slides of trade areas for each branch '''(PPT slides)''' </center> ||
 +
* A dataset containing each individual hand drawn trade area for each branch in Taiwan
 
* Used as features for analysis with regards to each branch
 
* Used as features for analysis with regards to each branch
 
|-
 
|-
Line 138: Line 62:
 
|}
 
|}
  
==<div style="background: #8b1209; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #FFFFFF>Foreseen Technical Challenges</font></div>==
 
We encountered the following technical challenges throughout the course of the project. We have indicated our proposed solutions, and the outcomes of the solutions.
 
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
 
|-
 
! style="font-weight: bold;background: #000000;color:#fbfcfd;width: 33%;" | Key Technical Challenges
 
! style="font-weight: bold;background: #000000;color:#fbfcfd;width: 33%;" | Proposed Solution
 
! style="font-weight: bold;background: #000000;color:#fbfcfd;width: 33%;" | Outcome
 
|-
 
| <center> Data is already pre-aggregated to display monthly sales  </center>
 
||
 
*The dataset is given directly to us from IFC, and we are unable to change it. Thus, We shall utilize and do our best with the available data.
 
||
 
NA
 
|-
 
| <center> Unfamiliarity in R and Python in creation of data processing scripts </center>
 
||
 
* Watching video tutorials about R and Python
 
* Independent learning on the design and syntax
 
* Peer learning and sharing
 
* Using Datacamp as our mentor
 
||
 
We managed to start using the languages quickly and suit our own project needs.
 
Each of us work on different parts such as setting up, designing, logic and deployment.
 
This speeds up our project progress.
 
  
|-
+
Steps for data transformation:
| <center> Data Cleaning & Transformation Proposed Solution </center>  
+
# '''Align to Geometry and Coordinate Reference System for all files: EPSG:3828'''
||  
+
# '''Extracting each POI from the different shp files'''
*Having a systematic process while working together in order to maximise efficiency e.g. taking turns to clean, transform and perform checks on the data to ensure accuracy
+
# '''Formation of polygons for each branch to align to ppt slides' trade area'''
||
+
# '''Creation of competitors' POI'''
The adopted process was having clear instructions issued to each member in the team, along with maintaining constant communication with each other. In the event that the dataset is deemed too dirty to be usable, it was dropped along with sourcing for new data that would be a suitable replacement.
+
# '''Aggregation of count of each POI to each trade area'''
|-
+
# '''Investigation on areas not covered by trade area'''
| <center> Lack of geospatial knowledge to understand the dataset initially </center>  
+
# Data pre-processing for Sales Data
||
+
# Aggregating Sales Data with each branch
*Attend SMT201 class to learn more, as well as reading up on resources given by Prof Kam to gain further contextual knowledge
+
# Regression Analysis
||
+
 
NA
+
==<div style="background: #8b1209; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #FFFFFF>1. Geometry and Coordinate Reference System</font></div>==
|-
+
<div style="font-family:Helvetica;font-size:16px">
| <center> Digitising of trade areas from powerpoint slide to QGIS </center>  
+
Every qgis file created have to align to the same reference system used by Taiwan Maps for accuracy and unit measurement.
||
+
<br/><br/>Information on 3828 Reference system: https://epsg.io/3828
*The process is manual and we had to put in a lot of effort to convert the drawn polygon to data points in QGIS.
+
[[Image: 3828.png |600px|center]]
||
+
 
The data points can better allow us to generate insights on the profile of each outlet via its trade area.
+
==<div style="background: #8b1209; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #FFFFFF>2. Extracting each POI from the different shp files</font></div>==
|-
+
<div style="font-family:Helvetica;font-size:16px">
| <center> Integrating Relevant Data from Multiple Sources Proposed Solution </center>  
+
We are given the following POIs to extract:
||
+
# ATM
*Working together to decide on what data to extract or eliminate
+
# Bank
||
+
# Bar or Pub
NA
+
# Bookstore
|-
+
# Bowling Centre
 +
# Bus Station
 +
# Business Facility
 +
# Cinema
 +
# Clothing Store
 +
# Coffee Shop
 +
# Commuter Rail Station
 +
# Consumer Electronics Store
 +
# Convenience Store
 +
# Department Store
 +
# Government Office
 +
# Grocery Store
 +
# Higher Education
 +
# Hospital
 +
# Hotel
 +
# Medical Service
 +
# Pharmacy
 +
# Residential Area/ Building
 +
# Restaurant
 +
# School
 +
# Shopping
 +
# Sports Centre
 +
# Sports Complex
 +
# Train Station
 +
<br/>
 +
These POIs have to be extracted by finding their respective Facility Type in the shp files provided.
 +
For example, Facility Type of 9583 was filtered using the filter function in QGIS and exported as a layer to the geopackage.
 +
 
 +
[[Image: FAC.png |600px|center]]
 +
 
 +
==<div style="background: #8b1209; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #FFFFFF>3. Formation of polygons for each branch to align to ppt slides' trade area</font></div>==
 +
As the trade areas are predefined from the ppt slides, we have to manually digitalise the trade area for each of the outlets. QGIS tools were used in the process: Split Features, Merge Features and our artistic skills. Moreover, we have added the store codes and area code given by the ppt slides so that we can easily identify these polygons.
 +
[[Image: Store.png |1000px|center]]
 +
<br/>
 +
Below is the generated polygons for all of our trade area.  
 +
[[Image: Do.png |1000px|center]]
 +
 
 +
==<div style="background: #8b1209; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #FFFFFF>4. Creation of competitors' POI</font></div>==
 +
<div style="font-family:Helvetica;font-size:16px">
 +
5 Competitors are identified and they are Dominos Pizza, Napolean Pizza, Mcdonalds, KFC and MosBurger. These data points are crucial in our analysis as they may have a correlation with the sales data. Hence, these competitors branches location have to be extracted from "Restaurant 5800".
 +
<br/><br/> After extraction, they are exported as layers in the geopackage to be used for aggregation.
 +
[[Image: Mac.png |600px|center]]
 +
 
 +
==<div style="background: #8b1209; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #FFFFFF>5. Aggregation of count of each POI to each trade area (Script)</font></div>==
 +
<div style="font-family:Helvetica;font-size:16px">
 +
 
 +
It is an exhausting task to aggregating each of the POIs into each trade area using the "Count Points in Polygon" tool. <br/><br/> The batch processing tool was tried but it is unable to append each newly created POI into an existing geopackage. Hence, a python script was written and it ran under QGIS's Python Console. It utilised '''"qgis:countpointsinpolygon"''' function, an inbuilt function by QGIS. <br/><br/> Learn more about the function here: https://docs.qgis.org/2.8/en/docs/user_manual/processing_algs/qgis/vector_analysis_tools/countpointsinpolygon.html
 +
 
 +
[[Image: Py.png |1000px|center]]
 +
 
 +
After the aggregation of POIs into each trade area, below is the screenshot of the columns for a particular trade area.
 +
 
 +
[[Image: Ta.png |1000px|center]]
 +
 
 +
Since all POIs are contained in their own geopackage, we need to find a way to merge these geopackages into one. Gladly, we have the '''OSGeo4W Shell'''. It allows us to run shell commands to merge these geopackages.
 +
<br/><br/>
 +
First, open the OSGeo4W Shell and cd to the directory that has the master geopackage (Taiwan_stores0x.gpkg) we want to merge into.
 +
 
 +
''cd C:\Users\.......\Project\data\GeoPackage''
 +
 
 +
And as all the layers in each geopackage is named as Taiwan_stores37, I want to rename them to their respective area. The test folder contains all the geopackages that I want to merge.<br/>
 +
 
 +
''for %f in (./test/*.gpkg); do ogrinfo ./test/%f -sql "ALTER TABLE Taiwan_stores37 RENAME TO %~nf"''
 +
 
 +
Lastly, the below command append and merges all the geopackages in the test folder into Taiwan_stores0x.gpkg. <br/>
 +
 
 +
''for %f in (./test/*.gpkg); do ogr2ogr -f "format" -append ./Taiwan_stores0x.gpkg ./test/%f''<br/>
 +
 
 +
==<div style="background: #8b1209; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #FFFFFF>6. Investigation on areas not covered by trade area</font></div>==
 +
As we observe the generated trade areas, it seems that there are some areas that are not covered by any of the branches. Hence, we decided to anaylse these areas.
 +
 
 +
[[Image: F1.png |500px|center]]
 +
<br/>
 +
In the above figure, the uncovered area is 大安森林公园 which is a park. Hence, it is explainable that such locations should not be covered in the delivery area.
 +
[[Image: F2.png |500px|center]]
 +
<br/>
 +
In the above figure, we do see buildings that are not in any trade area. Therefore, this uncovered area may have be omitted and should be fulfilled by one of the branches.
  
|}
 
  
==<div style="background: #8b1209; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #FFFFFF>Roles & Milestones (WIP)</font></div>==
+
==<div style="background: #8b1209; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #FFFFFF>7. Extracting Sales Data</font></div>==
 +
We obtained raw sales data which was generated from the company's Sales system. This was provided to us in a Report format which wasn't directly importable into QGIS. There were about 200 over outlet's data and it would have been impossible to do it manually one by one. Therefore we wrote 2 script that was used to extract key information from the Report and automated the output for the whole directory.
 +
</br>
 +
We found that the report although wasn't exact directly importable into QGIS. There was a pattern in each reports which allow us to easily parse and obtain key information which can be outputted into 2 different types of output: Daily Sales Data, Regional Sales Data for a Year. </br>
  
*Project Timeline
+
[[File:Sales script.jpg|800px]] </br>
[[Image: Tl.png |600px|center]]
+
Both scripts were written in Python. We found that valid rows has date on its left for the Daily Sales Data. Therefore we made us of this pattern to export the daily sales data into a separate CSV. In another similar script, we identified that once a certain word is encountered in the first column, any rows after that consisted of Regional Sales data and therefore made us of that fact to wrote a second script for a different output.</br><br>
  
==<div style="background: #8b1209; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #FFFFFF>References</font></div>==
+
[[File:For all files.png|800px]] </br>
*Project Page: https://wiki.smu.edu.sg/1920t1smt201/GIS_Project
+
This method is then called in both scripts to enable all the Reports in the directory to be processed and outputted as a separate CSV in another directory. This allowed us to easily distribute the Regional Sales Data to the other groups who were working on different regions.</br></br>
*Python Pandas: https://pandas.pydata.org/
 
*Tableau: https://www.tableau.com/learn/training
 
*QGIS: http://www.qgistutorials.com/en/
 
  
 
==<div style="background: #8b1209; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #FFFFFF>Comments</font></div>==
 
==<div style="background: #8b1209; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #FFFFFF>Comments</font></div>==

Latest revision as of 10:10, 4 November 2019

3hg.png

Back to Project Home

 

PROPOSAL

 

DATA TRANSFORMATION

 

POSTER

 

APPLICATION

 

RESEARCH PAPER


Preliminary Data Observations

These are the datasets we are using for data transformation:

Dataset Rationale
Administrative Boundaries, Taiwan
  • A dataset containing SHP files of the administrative boundaries of taiwan (county, town, village)
  • Used as a reference to digitize IFC branch trade areas
Branch location of IFC, Taiwan
  • A dataset containing the geographical information of each individual branch.
  • Used as the main target of our project
  • Branches (FID) that we are looking into: 5, 162, 195, 7, 57, 25, 144, 129, 54, 81, 69, 132, 53, 122
Point of Interests (POI) , Taiwan
  • A dataset containing each individual Point-Of-Interests in Taiwan (e.g. ATMs, Amusement Parks, Banks)
  • Used as features for analysis with regards to each branch
Powerpoint Slides of trade areas for each branch (PPT slides)
  • A dataset containing each individual hand drawn trade area for each branch in Taiwan
  • Used as features for analysis with regards to each branch
Outlets Monthly Sales Data
  • A dataset containing the monthly sales information of each individual branch
  • Used to study the sales data along with the profile of each branch to generate yielding patterns (e.g. top and bottom performer)


Steps for data transformation:

  1. Align to Geometry and Coordinate Reference System for all files: EPSG:3828
  2. Extracting each POI from the different shp files
  3. Formation of polygons for each branch to align to ppt slides' trade area
  4. Creation of competitors' POI
  5. Aggregation of count of each POI to each trade area
  6. Investigation on areas not covered by trade area
  7. Data pre-processing for Sales Data
  8. Aggregating Sales Data with each branch
  9. Regression Analysis

1. Geometry and Coordinate Reference System

Every qgis file created have to align to the same reference system used by Taiwan Maps for accuracy and unit measurement.

Information on 3828 Reference system: https://epsg.io/3828

3828.png

2. Extracting each POI from the different shp files

We are given the following POIs to extract:

  1. ATM
  2. Bank
  3. Bar or Pub
  4. Bookstore
  5. Bowling Centre
  6. Bus Station
  7. Business Facility
  8. Cinema
  9. Clothing Store
  10. Coffee Shop
  11. Commuter Rail Station
  12. Consumer Electronics Store
  13. Convenience Store
  14. Department Store
  15. Government Office
  16. Grocery Store
  17. Higher Education
  18. Hospital
  19. Hotel
  20. Medical Service
  21. Pharmacy
  22. Residential Area/ Building
  23. Restaurant
  24. School
  25. Shopping
  26. Sports Centre
  27. Sports Complex
  28. Train Station


These POIs have to be extracted by finding their respective Facility Type in the shp files provided. For example, Facility Type of 9583 was filtered using the filter function in QGIS and exported as a layer to the geopackage.

FAC.png

3. Formation of polygons for each branch to align to ppt slides' trade area

As the trade areas are predefined from the ppt slides, we have to manually digitalise the trade area for each of the outlets. QGIS tools were used in the process: Split Features, Merge Features and our artistic skills. Moreover, we have added the store codes and area code given by the ppt slides so that we can easily identify these polygons.

Store.png


Below is the generated polygons for all of our trade area.

Do.png

4. Creation of competitors' POI

5 Competitors are identified and they are Dominos Pizza, Napolean Pizza, Mcdonalds, KFC and MosBurger. These data points are crucial in our analysis as they may have a correlation with the sales data. Hence, these competitors branches location have to be extracted from "Restaurant 5800".

After extraction, they are exported as layers in the geopackage to be used for aggregation.

Mac.png

5. Aggregation of count of each POI to each trade area (Script)

It is an exhausting task to aggregating each of the POIs into each trade area using the "Count Points in Polygon" tool.

The batch processing tool was tried but it is unable to append each newly created POI into an existing geopackage. Hence, a python script was written and it ran under QGIS's Python Console. It utilised "qgis:countpointsinpolygon" function, an inbuilt function by QGIS.

Learn more about the function here: https://docs.qgis.org/2.8/en/docs/user_manual/processing_algs/qgis/vector_analysis_tools/countpointsinpolygon.html

Py.png

After the aggregation of POIs into each trade area, below is the screenshot of the columns for a particular trade area.

Ta.png

Since all POIs are contained in their own geopackage, we need to find a way to merge these geopackages into one. Gladly, we have the OSGeo4W Shell. It allows us to run shell commands to merge these geopackages.

First, open the OSGeo4W Shell and cd to the directory that has the master geopackage (Taiwan_stores0x.gpkg) we want to merge into.

cd C:\Users\.......\Project\data\GeoPackage

And as all the layers in each geopackage is named as Taiwan_stores37, I want to rename them to their respective area. The test folder contains all the geopackages that I want to merge.

for %f in (./test/*.gpkg); do ogrinfo ./test/%f -sql "ALTER TABLE Taiwan_stores37 RENAME TO %~nf"

Lastly, the below command append and merges all the geopackages in the test folder into Taiwan_stores0x.gpkg.

for %f in (./test/*.gpkg); do ogr2ogr -f "format" -append ./Taiwan_stores0x.gpkg ./test/%f

6. Investigation on areas not covered by trade area

As we observe the generated trade areas, it seems that there are some areas that are not covered by any of the branches. Hence, we decided to anaylse these areas.

F1.png


In the above figure, the uncovered area is 大安森林公园 which is a park. Hence, it is explainable that such locations should not be covered in the delivery area.

F2.png


In the above figure, we do see buildings that are not in any trade area. Therefore, this uncovered area may have be omitted and should be fulfilled by one of the branches.


7. Extracting Sales Data

We obtained raw sales data which was generated from the company's Sales system. This was provided to us in a Report format which wasn't directly importable into QGIS. There were about 200 over outlet's data and it would have been impossible to do it manually one by one. Therefore we wrote 2 script that was used to extract key information from the Report and automated the output for the whole directory.
We found that the report although wasn't exact directly importable into QGIS. There was a pattern in each reports which allow us to easily parse and obtain key information which can be outputted into 2 different types of output: Daily Sales Data, Regional Sales Data for a Year.

Sales script.jpg
Both scripts were written in Python. We found that valid rows has date on its left for the Daily Sales Data. Therefore we made us of this pattern to export the daily sales data into a separate CSV. In another similar script, we identified that once a certain word is encountered in the first column, any rows after that consisted of Regional Sales data and therefore made us of that fact to wrote a second script for a different output.

For all files.png
This method is then called in both scripts to enable all the Reports in the directory to be processed and outputted as a separate CSV in another directory. This allowed us to easily distribute the Regional Sales Data to the other groups who were working on different regions.

Comments

Feel free to leave comments / suggestions!