Difference between revisions of "ANLY482 AY2017-18T2 Group26 Findings & Insights"

From Analytics Practicum
Jump to navigation Jump to search
m
Line 35: Line 35:
 
{| cellspacing=20px|  
 
{| cellspacing=20px|  
 
| Please refer to our dropbox on eLearn or email us to know more!
 
| Please refer to our dropbox on eLearn or email us to know more!
 +
|}
 +
</div>
 +
 +
{|style=" width="100%" cellspacing="20" border="0px"
 +
| <div style="background:#ffffff; font-size:200%; text-align:center; border-bottom:2px dashed #ffdd99" width=10%><font face="Helvetica" color=#ffbb33>Exploratory Data Analysis</font></div>
 +
|}
 +
<center>
 +
<div style="font-size:140%;">
 +
{| cellspacing=20px|
 +
Below is a map of all the outlets of Company PQR. We can see that the most outlets are in the North East of Singapore. The colour of the dots indicates the number of customers that have visited the outlet in the last quarter (October to December 2017). Looking at the colour intensity, we can see that most regions have at least one outlet with a high customer count.
 +
 +
<br>
 +
[[Image:Wiki1.PNG|500px]]
 +
|}
 +
</div>
 +
 +
<div style="font-size:140%;">
 +
{| cellspacing=20px|
 +
The box plot below, shows us Customer Count by outlet, same as the intensity of the blue colours above. The labels indicate Subzones that the outlets fall into. Subzones with the highest customer count are Hougang Wast with 139,018, Choa Chu Kang with 128,242, Pearls Hill (Chinatown) with 126,477, Jurong West with 117,507, & Lorong Ah Soo with 109,242 customers who visited in the last quarter of 2017. These locations are in varying regions unlike the locations with the lowest customer count concentrated at Kranji with 1,524, Chinatown with 4,635 & Serangoon Central at 7915 customers. This is interesting as Pearls Hill attracted one of the highest customers in comparison to Chinatown. Further, the average customer count is 52,454 and we can see from the box plot that there is no regional focus of number of customers and subzones.
 +
<br>
 +
[[Image:Wiki2.PNG|500px]]
 +
|}
 +
</div>
 +
 +
<div style="font-size:140%;">
 +
{| cellspacing=20px|
 +
A comparison of type of outlet (Company PQR branch or store). Again, it is clear that Stores that have a higher aggregated sales revenue in comparison to branches.
 +
<br>
 +
[[Image:Wiki3.PNG|500px]]
 +
|}
 +
</div>
 +
 +
<div style="font-size:140%;">
 +
{| cellspacing=20px|
 +
The pie charts below show the customer count by outlet type and number of outlets that are Branches and Stores. The Stores reflect a higher customer count and number of outlets.
 +
<br>
 +
[[Image:Wiki4.PNG|500px]]
 +
|}
 +
</div>
 +
 +
<div style="font-size:140%;">
 +
{| cellspacing=20px|
 +
Below is the spread of the POIs within Singapore, colour coded by type. Bus Interchange, Food and Beverage and Childcare Centre appear to be the most popular. Childcare, Food and Beverage and School appear the most frequently. Sikh Temple, Tertiary Institutions and Indian Temples are the lowest.
 +
<br>
 +
[[Image:Wiki5.PNG|500px]]
 +
|}
 +
</div>
 +
 +
<div style="font-size:140%;">
 +
{| cellspacing=20px|
 +
In order to filter out the important POIs, which are those within 500m of the existing outlets, we mapped the POIs and the outlets in QGIS. We saw the distribution of the Customer Count values and decided to take the highest values (over 100,000 customers). Analysing the outlets with the highest customer counts, we generated another map (as shown below) to see their distribution across subzones. There is exactly one outlet with high customer count in each of the below subzones.
 +
<br>
 +
[[Image:Wiki6.PNG|500px]]
 +
|}
 +
</div>
 +
 +
<div style="font-size:140%;">
 +
{| cellspacing=20px|
 +
Next, we analysed the POIs within 500m of the outlets within the subzones using a Distance Matrix in QGIS and then found the distribution of these POIs by Type. This helps us analyse which POIs might have a higher influence in drawing customers to the outlets with high customer count. The highest appearing POIs here are Food and Beverage, Childcare Centre and Supermarket as well.
 +
<br>
 +
[[Image:Wiki7.PNG|500px]]
 +
|}
 +
</div>
 +
 +
<div style="font-size:140%;">
 +
{| cellspacing=20px|
 +
The chart below represents the number of outlets that have the specific number of TIMs (number of counters) in them, from 1 to 7. The data revealed about 5 outlets with a ‘NULL’ value, we treated as missing values and excluded them from this chart. From the bar graph below, it is apparent that most outlets have 2 or 3 TIMs. In order to understand this further, it would be useful to investigate where the outlets with low and high TIMs are and if there exists any correlation to customer count.
 +
<br>
 +
[[Image:Wiki8.PNG|500px]]
 +
|}
 +
</div>
 +
 +
 +
<div style="font-size:140%;">
 +
{| cellspacing=20px|
 +
The map below is designed with the coordinate size representing the customer count of the branches, and the shade of the coordinate representing the number of TIMs at that outlet (darker the green, more the TIMs). If we filter the map to just display outlets with high TIMs, we can see outlets in Choa Chu Kang, Clementi, Ang Mo Kio, Kallang, Outram, Bedok and Hougang. Only the branches in Bedok and Hougang have a low customer count, whereas the rest are higher.
 +
<br>
 +
[[Image:Wiki9.PNG|500px]]
 +
|}
 +
</div>
 +
 +
<div style="font-size:140%;">
 +
{| cellspacing=20px|
 +
If the map is filtered by branches with high customer count, we can see branches mostly in central and North West of Singapore, with at least 4 or more TIMs. This verifies our hypothesis that branches with high customer count will need a high Actual TIMs count.
 +
 +
We used graphs to visualize different combinations of variables (as shown above). Aggregations and statistical analysis were done along the way, like when using subzones instead of individual outlets and showing a bigger picture (sales revenue, distribution by type of product, and by producing a box plot for customer count by subzone). Furthermore, we used a Geospatial Distance Matrix to limit the POIs we looked at for each branch. We used a distance of 500m which covers the flower hexagons. QGIS allowed us to filter this way and we could derive the outlets with a high number of POIs and look at them more closely. Additionally, QGIS also allowed us to link the hexagons to outlet points via a Polygon to Coordinate Analysis. This way we could analyse more accurately while aggregating outlet points in a polygon.
 +
 +
Based on our findings, we decided to use Multiple Linear Regressions to find the relations between our variables and the financial outputs of Company PQR.
 
|}
 
|}
 
</div>
 
</div>
  
 
<!--/Content-->
 
<!--/Content-->

Revision as of 04:27, 15 April 2018

DataDiversLogo.png


002-swimming-pool.png
005-swimming.png
003-sun-protection.png
001-sports.png
004-sunbed.png
Home Project Overview Findings & Insights Project Management Link to Other Projects
Due to the confidentiality of our data we can't disclose our Findings and Insights publicly.

Insights TEMP.gif Findings TEMP.gif

Please refer to our dropbox on eLearn or email us to know more!
Exploratory Data Analysis
Below is a map of all the outlets of Company PQR. We can see that the most outlets are in the North East of Singapore. The colour of the dots indicates the number of customers that have visited the outlet in the last quarter (October to December 2017). Looking at the colour intensity, we can see that most regions have at least one outlet with a high customer count.
Wiki1.PNG
The box plot below, shows us Customer Count by outlet, same as the intensity of the blue colours above. The labels indicate Subzones that the outlets fall into. Subzones with the highest customer count are Hougang Wast with 139,018, Choa Chu Kang with 128,242, Pearls Hill (Chinatown) with 126,477, Jurong West with 117,507, & Lorong Ah Soo with 109,242 customers who visited in the last quarter of 2017. These locations are in varying regions unlike the locations with the lowest customer count concentrated at Kranji with 1,524, Chinatown with 4,635 & Serangoon Central at 7915 customers. This is interesting as Pearls Hill attracted one of the highest customers in comparison to Chinatown. Further, the average customer count is 52,454 and we can see from the box plot that there is no regional focus of number of customers and subzones.
Wiki2.PNG
A comparison of type of outlet (Company PQR branch or store). Again, it is clear that Stores that have a higher aggregated sales revenue in comparison to branches.
Wiki3.PNG
The pie charts below show the customer count by outlet type and number of outlets that are Branches and Stores. The Stores reflect a higher customer count and number of outlets.
Wiki4.PNG
Below is the spread of the POIs within Singapore, colour coded by type. Bus Interchange, Food and Beverage and Childcare Centre appear to be the most popular. Childcare, Food and Beverage and School appear the most frequently. Sikh Temple, Tertiary Institutions and Indian Temples are the lowest.
Wiki5.PNG
In order to filter out the important POIs, which are those within 500m of the existing outlets, we mapped the POIs and the outlets in QGIS. We saw the distribution of the Customer Count values and decided to take the highest values (over 100,000 customers). Analysing the outlets with the highest customer counts, we generated another map (as shown below) to see their distribution across subzones. There is exactly one outlet with high customer count in each of the below subzones.
Wiki6.PNG
Next, we analysed the POIs within 500m of the outlets within the subzones using a Distance Matrix in QGIS and then found the distribution of these POIs by Type. This helps us analyse which POIs might have a higher influence in drawing customers to the outlets with high customer count. The highest appearing POIs here are Food and Beverage, Childcare Centre and Supermarket as well.
Wiki7.PNG
The chart below represents the number of outlets that have the specific number of TIMs (number of counters) in them, from 1 to 7. The data revealed about 5 outlets with a ‘NULL’ value, we treated as missing values and excluded them from this chart. From the bar graph below, it is apparent that most outlets have 2 or 3 TIMs. In order to understand this further, it would be useful to investigate where the outlets with low and high TIMs are and if there exists any correlation to customer count.
Wiki8.PNG


The map below is designed with the coordinate size representing the customer count of the branches, and the shade of the coordinate representing the number of TIMs at that outlet (darker the green, more the TIMs). If we filter the map to just display outlets with high TIMs, we can see outlets in Choa Chu Kang, Clementi, Ang Mo Kio, Kallang, Outram, Bedok and Hougang. Only the branches in Bedok and Hougang have a low customer count, whereas the rest are higher.
Wiki9.PNG
If the map is filtered by branches with high customer count, we can see branches mostly in central and North West of Singapore, with at least 4 or more TIMs. This verifies our hypothesis that branches with high customer count will need a high Actual TIMs count. We used graphs to visualize different combinations of variables (as shown above). Aggregations and statistical analysis were done along the way, like when using subzones instead of individual outlets and showing a bigger picture (sales revenue, distribution by type of product, and by producing a box plot for customer count by subzone). Furthermore, we used a Geospatial Distance Matrix to limit the POIs we looked at for each branch. We used a distance of 500m which covers the flower hexagons. QGIS allowed us to filter this way and we could derive the outlets with a high number of POIs and look at them more closely. Additionally, QGIS also allowed us to link the hexagons to outlet points via a Polygon to Coordinate Analysis. This way we could analyse more accurately while aggregating outlet points in a polygon. Based on our findings, we decided to use Multiple Linear Regressions to find the relations between our variables and the financial outputs of Company PQR.