Difference between revisions of "ANLY482 AY2016-17 T2 Group12 : Project Overview / Methodology"

From Analytics Practicum
Jump to navigation Jump to search
 
(6 intermediate revisions by 2 users not shown)
Line 29: Line 29:
 
* Email
 
* Email
 
* Feedback Form
 
* Feedback Form
 +
 +
TSK Transporters have also search for additional data source regarding public holiday as TSK Transporters maybe analysing how public holiday correlates with feedback volume. Listed below are three data sources corresponding to public holiday:
 +
* [http://www.mom.gov.sg/newsroom/press-releases/2013/singapore-public-holidays-2014%20 One]
 +
* [http://www.mom.gov.sg/newsroom/press-releases/2014/singapore-public-holidays-2015%20 Two]
 +
* [http://www.mom.gov.sg/employment-practices/public-holidays%20 Three]
  
 
==<div style="background: #34454c; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size: 16px"><font color=#FFFFFF>Tools Used</font></div>==
 
==<div style="background: #34454c; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size: 16px"><font color=#FFFFFF>Tools Used</font></div>==
 
*Microsoft Excel 2016
 
*Microsoft Excel 2016
 
*JMP Pro 13
 
*JMP Pro 13
*Tableau 10.0
+
*D3.js
  
 
==<div style="background: #34454c; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size: 16px"><font color=#FFFFFF>Methodology</font></div>==
 
==<div style="background: #34454c; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size: 16px"><font color=#FFFFFF>Methodology</font></div>==
 +
<b>Discovery</b><br/>
 
Our team will first understand what KST Bikers is all about through their website, annual reports, social media platforms and by asking our sponsor. Secondly, we will identify potential additional data sources that will help with our analysis. Lastly, we will research to find out what are some techniques or ideas on how to analyse feedback data. The following are some research that we have done and our key findings of each article:  
 
Our team will first understand what KST Bikers is all about through their website, annual reports, social media platforms and by asking our sponsor. Secondly, we will identify potential additional data sources that will help with our analysis. Lastly, we will research to find out what are some techniques or ideas on how to analyse feedback data. The following are some research that we have done and our key findings of each article:  
 +
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
Line 43: Line 50:
 
!!  style="background: #465d66; color: white; font-weight: bold;" |Summary of Key Findings
 
!!  style="background: #465d66; color: white; font-weight: bold;" |Summary of Key Findings
 
|-
 
|-
| 1 || [http://www.operationalsynergy.co.uk/why-analysing-feedback-is-essential/%20 Top tips on how to analyse feedback]||  
+
| 1  
 +
|| [http://www.operationalsynergy.co.uk/why-analysing-feedback-is-essential/%20 Top tips on how to analyse feedback]||  
 
Having a comprehension of how to use present and future state process mapping and the advantages of using data boxes, plus a visual workflow diagram are going to be essential in the most of the cases and will increase value to your data analysis. This provides a clear visual help in seeing where the bottlenecks are in your processing and areas where you have to made the improvements. <br>
 
Having a comprehension of how to use present and future state process mapping and the advantages of using data boxes, plus a visual workflow diagram are going to be essential in the most of the cases and will increase value to your data analysis. This provides a clear visual help in seeing where the bottlenecks are in your processing and areas where you have to made the improvements. <br>
  
Line 49: Line 57:
  
 
Data analysis in the form of a chart will bring up some important areas for discussion, revisit and future strategy. <br>
 
Data analysis in the form of a chart will bring up some important areas for discussion, revisit and future strategy. <br>
 
 
|-
 
|-
| 2|| [http://www.itl.nist.gov/div898/handbook/eda/section1/eda11.htm%20  What is EDA?]|| Exploratory data analysis (EDA) is not just a collection of techniques. It is a philosophy as to how we breakdown a data set; what to look out for; how we look; and how to interpret. Most EDA techniques are graphical with little quantitative techniques. There is heavy reliance on graphics as the main role of EDA is to open-mindedly explore.
+
| 2
 +
|| [http://www.itl.nist.gov/div898/handbook/eda/section1/eda11.htm%20  What is EDA?]|| Exploratory data analysis (EDA) is not just a collection of techniques. It is a philosophy as to how we breakdown a data set; what to look out for; how we look; and how to interpret. Most EDA techniques are graphical with little quantitative techniques. There is heavy reliance on graphics as the main role of EDA is to open-mindedly explore.
 
|-
 
|-
| 3|| [https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-science%20 Why You’re Not Getting Value from Your Data Science]|| Business users keep coming up with problems and data analysts cannot keep up as they take much time build sophisticated data models. The most common problem is that data scientists often do not build their work around the final objective which is to derive business value. The following are the best practices:
+
| 3
 +
|| [https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-science%20 Why You’re Not Getting Value from Your Data Science]|| Business users keep coming up with problems and data analysts cannot keep up as they take much time build sophisticated data models. The most common problem is that data scientists often do not build their work around the final objective which is to derive business value. The following are the best practices:
 
*Stick with simple models
 
*Stick with simple models
 
*Explore more business problems: Instead of exploring one business problem with a sophisticated business models. Build a simple model for each problem and assess the value proposition
 
*Explore more business problems: Instead of exploring one business problem with a sophisticated business models. Build a simple model for each problem and assess the value proposition
Line 63: Line 72:
 
The dataset is from KST Bikers’s internal database which is collected from a variety of sources such as email, SMS, mobile application, online feedback and call centre. Our team will also be using additional datasets such as weather and public holiday data. Having such data allows us to examine external factors which could impact the generation of feedbacks.
 
The dataset is from KST Bikers’s internal database which is collected from a variety of sources such as email, SMS, mobile application, online feedback and call centre. Our team will also be using additional datasets such as weather and public holiday data. Having such data allows us to examine external factors which could impact the generation of feedbacks.
 
<br/><br/>
 
<br/><br/>
 +
 
<b>Data Cleaning</b><br/>
 
<b>Data Cleaning</b><br/>
Outliers and missing values cause data inaccuracy. Hence, our team will remove missing values and outliers. However, if there are too many outliers, they will be treated as a separate group for analysis. In addition, data cleaning also includes ensuring the data for each variable is consistent in its format. For example, the variable, Location, consists of street names, geolocation coordinates and location of the feedback such as pavement, railings and lifts. Thus, it will affect how our team will analyse or process the data.
+
The dataset is from KST Biker's internal EFMS database which is collected from a variety of sources such as email, SMS, mobile application, online feedback and call centre. Our team had also included an additional dataset on public holiday data to aide us in our analysis.
 
<br/><br/>
 
<br/><br/>
<b>Data Normalization and Transformation</b><br/>
+
 
As the variables in the dataset have different forms of measurements, normalization could be conducted to provide equal weightage to each variable. Z-score normalization will be used. If the distribution of the variables is found to be skewed, natural log will be conducted to each involved variable to make the model more normally distributed.
+
<b>Data Cleaning and Transformation</b><br/>
 +
For this project, our team is conducting descriptive analysis and thus, there is not a need to remove any missing values, outliers or conduct any data normalization. However, a missing data pattern analysis will be done to find out if there are any missing values that could be filled up to aide our analysis. In addition, there is a need to ensure that the data for each variable is consistent and in a readable format.  
 
<br/><br/>
 
<br/><br/>
 +
 
<b>Data Exploration</b><br/>
 
<b>Data Exploration</b><br/>
Our team will first look into the summary statistics of each variable to get an overview of the dataset. From there, we will spot missing values, identify outliers and select necessary variables such as categories and subcategories for analysis. We will then identify trends such as which categories or subcategories has the highest feedbacks. This will allow us to figure out which are the top few most important problems that Singaporeans faced. In addition, with the additional data sources, we will examine how do the external factors impact the generation of the feedbacks.  
+
Our team first looked into the summary statistics of each variable to get an overview of the dataset. From there, we spot missing values and select key variables for analysis. We will then identify trends based on the top 10 categories of feedback. This will allow us to focus on the top few most important issues that Singaporeans faced. Furthermore, the team also did a control chart analysis to understand if there are any unusual data patterns occurring on a daily basis.
 
<br/><br/>
 
<br/><br/>
 +
 
<b>Dashboards</b><br/>
 
<b>Dashboards</b><br/>
Two visual dashboards will be created for KST Bikers to visualize the analysis using softwares such as Tableau. The dashboards will provide a summary of the trends in the feedback data and the different external factors which generate these feedbacks. From there, our team will formulate insights and recommendations to KST Bikers.
+
Initially, our team proposed to have two dashboards using Tableau. One to provide a summary of the trends and the other to show the different external factors that generate feedbacks. With the change in objectives, a dashboard will be created for KST Bikers to visualize the analysis using D3.js. It will help KST Bikers to do some form of data cleaning when they upload the data, and provide an overview of the trends in the feedback data. KST Bikers would be then able to view the breakdown of feedback volume by group, category, sub-category and time such as year, quarter or month. Our team will be using D3.js, an open source software, as it is able to build an interactive dashboard and no software installation will be needed.

Latest revision as of 14:52, 21 April 2017

Home

About Us

Project Overview

Findings

Project Management

Documentation

Other Group Projects

Description Methodology


Data

The dataset provided by KST Bikers is a Feedback System which consists of feedback lodged by:

  • SMS
  • Email
  • Feedback Form

TSK Transporters have also search for additional data source regarding public holiday as TSK Transporters maybe analysing how public holiday correlates with feedback volume. Listed below are three data sources corresponding to public holiday:

Tools Used

  • Microsoft Excel 2016
  • JMP Pro 13
  • D3.js

Methodology

Discovery
Our team will first understand what KST Bikers is all about through their website, annual reports, social media platforms and by asking our sponsor. Secondly, we will identify potential additional data sources that will help with our analysis. Lastly, we will research to find out what are some techniques or ideas on how to analyse feedback data. The following are some research that we have done and our key findings of each article:

S/N Title of Article Summary of Key Findings
1 Top tips on how to analyse feedback

Having a comprehension of how to use present and future state process mapping and the advantages of using data boxes, plus a visual workflow diagram are going to be essential in the most of the cases and will increase value to your data analysis. This provides a clear visual help in seeing where the bottlenecks are in your processing and areas where you have to made the improvements.

Other methods include cause and effect diagrams, like the fishbone technique with the 5 whys, which enable you to identify your root causes and will introduce you to your path of resolving your key critical areas.

Data analysis in the form of a chart will bring up some important areas for discussion, revisit and future strategy.

2 What is EDA? Exploratory data analysis (EDA) is not just a collection of techniques. It is a philosophy as to how we breakdown a data set; what to look out for; how we look; and how to interpret. Most EDA techniques are graphical with little quantitative techniques. There is heavy reliance on graphics as the main role of EDA is to open-mindedly explore.
3 Why You’re Not Getting Value from Your Data Science Business users keep coming up with problems and data analysts cannot keep up as they take much time build sophisticated data models. The most common problem is that data scientists often do not build their work around the final objective which is to derive business value. The following are the best practices:
  • Stick with simple models
  • Explore more business problems: Instead of exploring one business problem with a sophisticated business models. Build a simple model for each problem and assess the value proposition
  • Learn from a sample of data – not all the data
  • Focus on automation: Use algorithms and develop software systems to automate data processing techniques

Data Collection
The dataset is from KST Bikers’s internal database which is collected from a variety of sources such as email, SMS, mobile application, online feedback and call centre. Our team will also be using additional datasets such as weather and public holiday data. Having such data allows us to examine external factors which could impact the generation of feedbacks.

Data Cleaning
The dataset is from KST Biker's internal EFMS database which is collected from a variety of sources such as email, SMS, mobile application, online feedback and call centre. Our team had also included an additional dataset on public holiday data to aide us in our analysis.

Data Cleaning and Transformation
For this project, our team is conducting descriptive analysis and thus, there is not a need to remove any missing values, outliers or conduct any data normalization. However, a missing data pattern analysis will be done to find out if there are any missing values that could be filled up to aide our analysis. In addition, there is a need to ensure that the data for each variable is consistent and in a readable format.

Data Exploration
Our team first looked into the summary statistics of each variable to get an overview of the dataset. From there, we spot missing values and select key variables for analysis. We will then identify trends based on the top 10 categories of feedback. This will allow us to focus on the top few most important issues that Singaporeans faced. Furthermore, the team also did a control chart analysis to understand if there are any unusual data patterns occurring on a daily basis.

Dashboards
Initially, our team proposed to have two dashboards using Tableau. One to provide a summary of the trends and the other to show the different external factors that generate feedbacks. With the change in objectives, a dashboard will be created for KST Bikers to visualize the analysis using D3.js. It will help KST Bikers to do some form of data cleaning when they upload the data, and provide an overview of the trends in the feedback data. KST Bikers would be then able to view the breakdown of feedback volume by group, category, sub-category and time such as year, quarter or month. Our team will be using D3.js, an open source software, as it is able to build an interactive dashboard and no software installation will be needed.