ANLY482 AY2017-18T2 Group18/TeamDAcct Project Methodology

From Analytics Practicum
Revision as of 23:42, 25 February 2018 by Zqlow.2014 (talk | contribs)
Jump to navigation Jump to search

TeamDAcctnew.png

Home About Us Project Overview Project Findings Project Management Documentation ANLY482 Homepage

 


Methodology


Data Cleaning:

From the data we receive, we will first perform data cleaning on it. This is especially important as the data prior to June 2016 are manually stored as text format and most software are unable to read it. Hence data cleaning is performed on it to ensure that the data prior to June 2016 is consistent with the rest of the data that we will be using.


Data Preparation:

As the client is a cleaning company that engages in a variety of different cleaning activities (e.g. landscape care and maintenance services), we will be separating the project sites into the different categories of cleaning services and conducting exploratory data analysis as well as subsequent analyses separately on each of the categories. The rationale for this separation is that the main factor that drives expenses for a certain category of cleaning service might be different from the main factor that drives expenses for another category of cleaning service. Thus, by conducting analysis separately on the different categories of cleaning services it will provide the client with more comprehensive insights.


Exploratory Data Analysis (EDA):

Following which, EDA (e.g. through the usage of graphs or tables of summary measures) will be conducted to get a better understanding of the data. From the EDA, we will have a better understanding of the relationship amongst the explanatory variables (factors that drives the expenses such as wages) as well as provide us with a general direction and size of relationship between explanatory and outcome variables. In the analysis of data, we will be using the software SAS JMP Pro 13 as our main tool for data cleaning, data preparation and EDA. Our choice of this software is that it allows us to conduct statistical analysis on big datasets and can generate results that are easy to understand for end users. In addition, due to its popularity of being widely used, tutorials are readily available on the web, should we encounter any problem.


Thereafter, we can formulate and suggest a model to the client that forecast the expenses incurred for an anticipated project site with certain characteristics. From a business perspective, this would point to possible approaches management wish to take (i.e. minimise cost, bid at higher price, pegged to industry price, market share growth), in using our proposed model to assist in shortlisting future project sites and to drive business strategy overall.


Regression Analysis:

We will be attempting to conduct regression analysis to predict what value the dependent variable will be given specific values of the independent variable(s). Regression analysis is a modelling technique used for analysing the relationship between a dependent variable (Y) and one or more independent variable (X1, X2, etc).


Based on the comments given by our sponsor and the feedback from our project supervisor, we have identified two approaches in performing the above analysis. The difference in both approaches however, would be the target(explained) variable to be predicted. The first approach will attempt to estimate the project costs (a single monetary value) given the input(explanatory) variables that we have identified in the data integration and filtering phase. The second approach on the other hand, will attempt to estimate the amount of resources needed for the different key cost components that comprises for a cleaning project site. We will be exploring both approaches in our model development phase, with the goal of such analysis to identify a function that describes, as closely as possible, the relationship between the target variable and input variables. Still, our group feels that the second approach will offer greater insights to management as they will be able to utilise that as their budget forecasting tool for procurement.