ANLY482 AY2017-18T2 Group26 Project Overview

From Analytics Practicum
Jump to navigation Jump to search
DataDiversLogo.png


002-swimming-pool.png
005-swimming.png
003-sun-protection.png
001-sports.png
004-sunbed.png
Home Project Overview Findings & Insights Project Management Link to Other Projects


Geospatial Analysis
Geospatial Analysis is the technique of using geospatial data – from mobile devices, location sensors, social media, etc – to build maps, graphs, statistics and analytical models to make complex relationships understandable. The benefits of using geospatial analysis is that it is a step above regular analytical insights; more engaging and more understandable and recognizable, it helps managers move from hindsight to foresight and develop location-based targeted solutions. Focussing on this aspect of geospatial analysis, we aim to come up with a method that takes into consideration past location data, and its impact on other aspects of the business, to help optimize future location based decision making.
(Referenced: Geospatial Analytics The three-minute guide. (2012). Retrieved from https://www2.deloitte.com/content/dam/Deloitte/global/Documents/Deloitte-Analytics/dttl-analytics-us-ba-geospatial3minguide.pdf)
Company PQR
PQR is a Singapore based company with over 100 branches spread across Singapore as well as a growing online presence. They have a pronounced focus on providing aid to the community. Their employees are committed to helping the community albeit the elderly, challenged youth or the environment. The company itself, contributes over 60% of their profits to the betterment of the community each year.
Data Received
All the data received is for Quarter 3 of 2017 which covers the months of October, November and December. Company PQR provided us with the following data files:
  • Population Data: Spatial files, in .shp format, containing the monthly breakdown of population into Resident, Worker and Transient for Singapore at a grid level. Further, the monthly crowd files contain the demographic breakdown for each population type.
  • Financial Data: The number of customers for each outlet and the sales revenue, as an aggregation for the quarter.
  • Point of Interest (POI) Data: This includes 16 different types of POIs spread across Singapore such as MRT, Food and Beverage, Churches etc.
  • Outlet Data: Data about each outlet with the number of counters (TIMS) and outlet type as it can be PQRs branch or a store.
  • Company PQR’s Model: We received files that outline PQR’s calculation of demand and sales revenue.
Motivation & Objectives
Company PQR has been facing a road block while estimating their sales targets, in order to meet their predicted demand and serve their customers better. They have conquered central Singapore and need a smarter method of approximating their demand while considering the anomalies of each branch location. An accurate demand estimation will make it easier to predict or set more realistic sales targets. By analyzing mobile data and points of interest around Singapore, we can more accurately estimate demand for their outlets and identify regions with untapped potential. Our project will use these data sets and its relationship with the financial performance of PQR branches all over Singapore.

Therefore, our objectives are:

  • To understand the existing model Company PQR is using to do their estimations.
  • To learn the correlations between population variables from mobile data, and points of interest to aid our regional demand estimations.
  • To develop an equation that weighs these variables in a way that produces the most accurate demand estimation. We aim to create our own model that more accurately estimates demand and identifies areas for potential expansion.
Methodology

Methodology .png

We used Tableau and JMP to derive a preliminary exploratory overview, in order to understand our data before mapping it in our chosen geographic information system, QGIS. In order to extract trends and correlation patterns from the datasets we received, we used QGIS to visualize density and proximity of PQR outlets and the corresponding points of interest (POIs).

We received Mobile Data, POI Data and Financial Data from Company PQR and used methods like Normalization and Standardization to clean it. We performed Aggregations, Statistical Analysis, and a Polygon to Coordinate Analysis as well as generated a Geospatial Distance Matrix to aid our Exploratory Data Analysis (EDA). To clean the data with location variables, we used OpenStreetMap as a reference, making sure all location data is in a consistent format and aligns with its respective OpenStreetMap location. Based on our Findings and Insights, we will be moving forward by creating Multiple Linear Regressions to find the relations between our variables and the financial outputs of Company PQR.

The following outline the steps we took in preparing the data:

  • We removed additional columns that were either irrelevant or auto-generated (in the case of files extracted from Company PQR’s model on ArcGIS).
  • Normalization: Created new files that help Integration (map POI to Outlet and Hex to Outlet) in a way that data isn’t redundant and useful files can be linked together for our analysis phase. We transposed some of the files to remove duplicates as well.
  • Standardized Formatting: All the files with Latitudinal and Longitudinal coordinates were edited to show up to 8 decimals so QGIS could accurately map them. We mapped the shape files to SVY21 EPSG: 4757 with a base map of OpenStreetMap configured the same way. Additionally, we re-projected the co-ordinate layers of POI and Outlets to SVY21 EPSG: 3414. This allows us to measure distances in meters when using the software to generate Distance Matrices, mentioned below.
  • Population Data Calculation: We calculated the population values for the months of October, November and December and then took the averages for each hexagon. To get the initial Resident, Transient and Worker values, we took the percentages for each of the hexagons and for each demographic to get the population breakdown for each month.
  • Distance Matrix Generation: We generated a Distance Matrix to retrieve the POIs within a 500m distance of each outlet to analyse the more significant types and quantities of POIs.


Literature Review
Linear regressions are conducted to get an explanation of dependent variable by using variables that have some relation to the dependent. It is known that in such an environment, the independent variables have a relation to the dependent and that this relationship is linear [1]. With sufficient data, we can infer reliable relationships that should accurately allow us to explain the dependent, financial measures in this case. Therefore, checking for multicollinearity through a collinear analysis is vital before moving on in the creation of a model [2]. By performing multiple linear regressions, we can find the combination of the independent variables that best explains our dependent [3]. In this approach, the dependent variables are a function of more than one independent variable, forming relationships among the variables and accommodating for additional ones if necessary [3]. Having been deemed as an effective and reliable method of explanation, we have implemented it, keeping in mind the shortcomings mentioned.
(Referenced: