Difference between revisions of "ANLY482 AY2017-18T2 Group26 Project Overview"

From Analytics Practicum
Jump to navigation Jump to search
Line 67: Line 67:
 
| <div style="background:#ffffff; font-size:200%; text-align:center; border-bottom:2px dashed #e6b590" width=10%><font face="Helvetica" color=#d47d3c>Methodology</font></div>
 
| <div style="background:#ffffff; font-size:200%; text-align:center; border-bottom:2px dashed #e6b590" width=10%><font face="Helvetica" color=#d47d3c>Methodology</font></div>
 
|}
 
|}
 +
 +
<center>
 +
[[Image:Methodology .png | 1000px]]
 +
<br>
 +
<br>
  
 
<div style="font-size:140%;">
 
<div style="font-size:140%;">
 
{| cellspacing=20px|  
 
{| cellspacing=20px|  
We are using tools like Tableau to derive a preliminary exploratory overview, in order to understand our data before mapping it in a geospatial software. To clean the data with location variables, we will be using OpenStreetMap as a reference, making sure all location data is in a consistent format and aligns with its respective OpenStreetMap location.  
+
We used Tableau and JMP to derive a preliminary exploratory overview, in order to understand our data before mapping it in our chosen geographic information system, QGIS. In order to extract trends and correlation patterns from the datasets we received, we used QGIS to visualize density and proximity of PQR outlets and the corresponding points of interest (POIs).
  
 
<br>
 
<br>
 
<br>
 
<br>
In order to extract trends and correlation patterns from this data, we intend to use QGIS to visualize density and hotspots at various locations around PQR outlets and the corresponding points of interest (POIs). We will analyse the sales model provided by PQR, checking for any inaccuracies, areas for improvement etc. Following this, we will our own estimation model and compare its accuracy to past demand data provided. Finally, we will summarize all of our models and visualizations in the form of a dashboard so they can be used by managers interactively to gain a deeper understanding.
+
We received Mobile Data, POI Data and Financial Data from Company PQR and used methods like Normalization and Standardization to clean it. We performed Aggregations, Statistical Analysis, and a Polygon to Coordinate Analysis as well as generated a Geospatial Distance Matrix to aid our Exploratory Data Analysis (EDA). To clean the data with location variables, we used OpenStreetMap as a reference, making sure all location data is in a consistent format and aligns with its respective OpenStreetMap location. Based on our Findings and Insights, we will be moving forward by creating Multiple Linear Regressions to find the relations between our variables and the financial outputs of Company PQR.
  
<br>
+
|}
<br>
+
</div>
We received Mobile Data, POI Data and Financial Data from Company PQR and used methods like Normalization and Standardization to clean it. We performed Aggregations, Statistical Analysis, and a Polygon to Coordinate Analysis as well as generated a Geospatial Distance Matrix to aid our Exploratory Data Analysis (EDA). Based on our Findings and Insights, we will be moving forward by creating Multiple Linear Regressions to find the relations between our variables and the financial outputs of Company PQR.
+
 
 +
{|style=" width="100%" cellspacing="20" border="0px"
 +
| <div style="background:#ffffff; font-size:200%; text-align:center; border-bottom:2px dashed #e6b590" width=10%><font face="Helvetica" color=#d47d3c>Literatire Review</font></div>
 +
|}
 +
 
 +
<div style="font-size:140%;">
 +
{| cellspacing=20px|
 +
Linear regressions are conducted to get an explanation of dependent variable by using variables that have some relation to the dependent. It is known that in such an environment, the independent variables have a relation to the dependent and that this relationship is linear [1]. With sufficient data, we can infer reliable relationships that should accurately allow us to explain the dependent, financial measures in this case. Therefore, checking for multicollinearity through a collinear analysis is vital before moving on in the creation of a model [2]. By performing multiple linear regressions, we can find the combination of the independent variables that best explains our dependent [3]. In this approach, the dependent variables are a function of more than one independent variable, forming relationships among the variables and accommodating for additional ones if necessary [3]. Having been deemed as an effective and reliable method of explanation, we have implemented it below, keeping in mind the shortcomings mentioned.
  
 
|}
 
|}

Revision as of 01:01, 15 April 2018

DataDiversLogo.png


002-swimming-pool.png
005-swimming.png
003-sun-protection.png
001-sports.png
004-sunbed.png
Home Project Overview Findings & Insights Project Management Link to Other Projects


Geospatial Analysis
Geospatial Analysis is the technique of using geospatial data – from mobile devices, location sensors, social media, etc – to build maps, graphs, statistics and analytical models to make complex relationships understandable. The benefits of using geospatial analysis is that it is a step above regular analytical insights; more engaging and more understandable and recognizable, it helps managers move from hindsight to foresight and develop location-based targeted solutions. Focussing on this aspect of geospatial analysis, we aim to come up with a method that takes into consideration past location data, and its impact on other aspects of the business, to help optimize future location based decision making.
(Referenced: Geospatial Analytics The three-minute guide. (2012). Retrieved from https://www2.deloitte.com/content/dam/Deloitte/global/Documents/Deloitte-Analytics/dttl-analytics-us-ba-geospatial3minguide.pdf)
Company PQR
PQR is a Singapore based company with over 100 branches spread across Singapore as well as a growing online presence. They have a pronounced focus on providing aid to the community. Their employees are committed to helping the community albeit the elderly, challenged youth or the environment. The company itself, contributes over 60% of their profits to the betterment of the community each year.
Motivation & Objectives
Company PQR has been facing a road block while estimating their sales targets, in order to meet their predicted demand and serve their customers better. They have conquered central Singapore and need a smarter method of approximating their demand while considering the anomalies of each branch location. An accurate demand estimation will make it easier to predict or set more realistic sales targets. By analyzing mobile data and points of interest around Singapore, we can more accurately estimate demand for their outlets and identify regions with untapped potential. Our project will use these data sets and its relationship with the financial performance of PQR branches all over Singapore.

Therefore, our objectives are:

  • To understand the existing model Company PQR is using to do their estimations.
  • To learn the correlations between population variables from mobile data, and points of interest to aid our regional demand estimations.
  • To develop an equation that weighs these variables in a way that produces the most accurate demand estimation. We aim to create our own model that more accurately estimates demand and identifies areas for potential expansion.
Methodology

Methodology .png

We used Tableau and JMP to derive a preliminary exploratory overview, in order to understand our data before mapping it in our chosen geographic information system, QGIS. In order to extract trends and correlation patterns from the datasets we received, we used QGIS to visualize density and proximity of PQR outlets and the corresponding points of interest (POIs).

We received Mobile Data, POI Data and Financial Data from Company PQR and used methods like Normalization and Standardization to clean it. We performed Aggregations, Statistical Analysis, and a Polygon to Coordinate Analysis as well as generated a Geospatial Distance Matrix to aid our Exploratory Data Analysis (EDA). To clean the data with location variables, we used OpenStreetMap as a reference, making sure all location data is in a consistent format and aligns with its respective OpenStreetMap location. Based on our Findings and Insights, we will be moving forward by creating Multiple Linear Regressions to find the relations between our variables and the financial outputs of Company PQR.
Literatire Review
Linear regressions are conducted to get an explanation of dependent variable by using variables that have some relation to the dependent. It is known that in such an environment, the independent variables have a relation to the dependent and that this relationship is linear [1]. With sufficient data, we can infer reliable relationships that should accurately allow us to explain the dependent, financial measures in this case. Therefore, checking for multicollinearity through a collinear analysis is vital before moving on in the creation of a model [2]. By performing multiple linear regressions, we can find the combination of the independent variables that best explains our dependent [3]. In this approach, the dependent variables are a function of more than one independent variable, forming relationships among the variables and accommodating for additional ones if necessary [3]. Having been deemed as an effective and reliable method of explanation, we have implemented it below, keeping in mind the shortcomings mentioned.