Difference between revisions of "ANLY482 AY2017-18T2 Group18/TeamDAcct Project Methodology"

From Analytics Practicum
Jump to navigation Jump to search
Line 52: Line 52:
  
 
<div style="border-style: solid; border-width:0; background: #0000cd; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Century Gothic;border-bottom:5px solid white; border-top:5px solid black"><font color= #ffffff>Methodology</font></div>   
 
<div style="border-style: solid; border-width:0; background: #0000cd; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Century Gothic;border-bottom:5px solid white; border-top:5px solid black"><font color= #ffffff>Methodology</font></div>   
 +
  
 
<br>
 
<br>
'''Data Cleaning:'''
+
'''Data Preparation:'''
 +
 
 +
All the files acquired have helped us deepen our understanding of the company’s business and ultimately the business problem. However, not all files were crucial to formulate recommendations to the business problem on hand.
 +
The following are the list of data-sets prepared:
  
From the data we receive, we will first perform data cleaning on it. This is especially important as the data prior to June 2016 are manually stored as text format and most software are unable to read it. Hence data cleaning is performed on it to ensure that the data prior to June 2016 is consistent with the rest of the data that we will be using.   
+
*Master Client Listing
  
<br>
+
*Facility Details
'''Data Preparation:'''
 
  
As the client is a cleaning company that engages in a variety of different cleaning activities (e.g. landscape care and maintenance services), we will be separating the project sites into the different categories of cleaning services and conducting exploratory data analysis as well as subsequent analyses separately on each of the categories. The rationale for this separation is that the main factor that drives expenses for a certain category of cleaning service might be different from the main factor that drives expenses for another category of cleaning service. Thus, by conducting analysis separately on the different categories of cleaning services it will provide the client with more comprehensive insights. 
+
*Manpower Details
  
 
<br>
 
<br>
 
'''Exploratory Data Analysis (EDA):'''
 
'''Exploratory Data Analysis (EDA):'''
  
Following which, EDA (e.g. through the usage of graphs or tables of summary measures) will be conducted to get a better understanding of the data. From the EDA, we will have a better understanding of the relationship amongst the explanatory variables (factors that drives the expenses such as wages) as well as provide us with a general direction and size of relationship between explanatory and outcome variables.
+
Following which, EDA (e.g. through the usage of graphs or tables of summary measures) was conducted to get a better understanding of the data. From the EDA, we will have a better understanding of the relationship amongst the explanatory variables (factors that drives the expenses such as wages) as well as provide us with a general direction and size of relationship between explanatory and outcome variables.
 
In the analysis of data, we will be using the software SAS JMP Pro 13 as our main tool for data cleaning, data preparation and EDA. Our choice of this software is that it allows us to conduct statistical analysis on big datasets and can generate results that are easy to understand for end users. In addition, due to its popularity of being widely used, tutorials are readily available on the web, should we encounter any problem.  
 
In the analysis of data, we will be using the software SAS JMP Pro 13 as our main tool for data cleaning, data preparation and EDA. Our choice of this software is that it allows us to conduct statistical analysis on big datasets and can generate results that are easy to understand for end users. In addition, due to its popularity of being widely used, tutorials are readily available on the web, should we encounter any problem.  
  
Line 73: Line 76:
  
 
<br>
 
<br>
'''Regression Analysis:'''
+
'''Analytical Methods:'''
 +
 
 +
After cleaning and preparing the data, we decided to work on our confirmatory analysis to assess the appropriateness of the explanatory variables to be used for our regression analysis. Thereafter, we performed multiple linear regression to build an explanatory model to analyse whether an increase in any of the explanatory variables (determined in our confirmatory analysis) has an impact on any of the various cost components incurred for a cleaning project type. The four major cost components that we have identified in a cleaning project are namely:
 +
 
 +
*Total Project Costs
 +
 
 +
*Manpower Costs
  
We will be attempting to conduct regression analysis to predict what value the dependent variable will be given specific values of the independent variable(s). Regression analysis is a modelling technique used for analysing the relationship between a dependent variable (Y) and one or more independent variable (X1, X2, etc).
+
*Chemical & Materials Costs
  
 +
*Equipment Costs
  
Based on the comments given by our sponsor and the feedback from our project supervisor, we have identified two approaches in performing the above analysis. The difference in both approaches however, would be the target(explained) variable to be predicted. The first approach will attempt to estimate the project costs (a single monetary value) given the input(explanatory) variables that we have identified in the data integration and filtering phase. The second approach on the other hand, will attempt to estimate the amount of resources needed for the different key cost components that comprises for a cleaning project site. We will be exploring both approaches in our model development phase, with the goal of such analysis to identify a function that describes, as closely as possible, the relationship between the target variable and input variables. Still, our group feels that the second approach will offer greater insights to management as they will be able to utilise that as their budget forecasting tool for procurement.
+
For this paper, we will be focusing on one specific cleaning project type - conservancy sites (zones).
  
  
 
</div><br>
 
</div><br>

Revision as of 21:23, 12 April 2018

TeamDAcctnew.png

Home About Us Project Overview Project Findings Project Management Documentation ANLY482 Homepage

 


Methodology



Data Preparation:

All the files acquired have helped us deepen our understanding of the company’s business and ultimately the business problem. However, not all files were crucial to formulate recommendations to the business problem on hand. The following are the list of data-sets prepared:

  • Master Client Listing
  • Facility Details
  • Manpower Details


Exploratory Data Analysis (EDA):

Following which, EDA (e.g. through the usage of graphs or tables of summary measures) was conducted to get a better understanding of the data. From the EDA, we will have a better understanding of the relationship amongst the explanatory variables (factors that drives the expenses such as wages) as well as provide us with a general direction and size of relationship between explanatory and outcome variables. In the analysis of data, we will be using the software SAS JMP Pro 13 as our main tool for data cleaning, data preparation and EDA. Our choice of this software is that it allows us to conduct statistical analysis on big datasets and can generate results that are easy to understand for end users. In addition, due to its popularity of being widely used, tutorials are readily available on the web, should we encounter any problem.


Thereafter, we can formulate and suggest a model to the client that forecast the expenses incurred for an anticipated project site with certain characteristics. From a business perspective, this would point to possible approaches management wish to take (i.e. minimise cost, bid at higher price, pegged to industry price, market share growth), in using our proposed model to assist in shortlisting future project sites and to drive business strategy overall.


Analytical Methods:

After cleaning and preparing the data, we decided to work on our confirmatory analysis to assess the appropriateness of the explanatory variables to be used for our regression analysis. Thereafter, we performed multiple linear regression to build an explanatory model to analyse whether an increase in any of the explanatory variables (determined in our confirmatory analysis) has an impact on any of the various cost components incurred for a cleaning project type. The four major cost components that we have identified in a cleaning project are namely:

  • Total Project Costs
  • Manpower Costs
  • Chemical & Materials Costs
  • Equipment Costs

For this paper, we will be focusing on one specific cleaning project type - conservancy sites (zones).