Difference between revisions of "ANLY482 AY2016-17 T2 Group12 : Project Overview / Methodology"

From Analytics Practicum
Jump to navigation Jump to search
Line 73: Line 73:
  
 
<b>Data Cleaning</b><br/>
 
<b>Data Cleaning</b><br/>
Outliers and missing values cause data inaccuracy. Hence, our team will remove missing values and outliers. However, if there are too many outliers, they will be treated as a separate group for analysis. In addition, data cleaning also includes ensuring the data for each variable is consistent in its format. For example, the variable, Location, consists of street names, geolocation coordinates and location of the feedback such as pavement. Thus, it will affect how our team will analyse or process the data.
+
The dataset is from KST Biker's internal EFMS database which is collected from a variety of sources such as email, SMS, mobile application, online feedback and call centre. Our team had also included an additional dataset on public holiday data to aide us in our analysis.
 
<br/><br/>
 
<br/><br/>
  
<b>Data Normalization and Transformation</b><br/>
+
<b>Data Cleaning and Transformation</b><br/>
As the variables in the dataset have different forms of measurements, normalization could be conducted to provide equal weightage to each variable. Z-score normalization will be used. If the distribution of the variables is found to be skewed, natural log will be conducted to each involved variable to make the model more normally distributed.
+
For this project, our team is conducting descriptive analysis and thus, there is not a need to remove any missing values, outliers or conduct any data normalization. However, a missing data pattern analysis will be done to find out if there are any missing values that could be filled up to aide our analysis. In addition, there is a need to ensure that the data for each variable is consistent and in a readable format.  
 
<br/><br/>
 
<br/><br/>
  
 
<b>Data Exploration</b><br/>
 
<b>Data Exploration</b><br/>
Our team will first look into the summary statistics of each variable to get an overview of the dataset. From there, we will spot missing values, identify outliers and select necessary variables such as categories and subcategories for analysis. We will then identify trends such as which categories or subcategories has the highest feedbacks. This will allow us to figure out which are the top few most important problems. In addition, with the additional data sources, we will examine how do the external factors impact the generation of the feedbacks.  
+
Our team first looked into the summary statistics of each variable to get an overview of the dataset. From there, we spot missing values and select key variables for analysis. We will then identify trends based on the top 10 categories of feedback. This will allow us to focus on the top few most important issues that Singaporeans faced. Furthermore, the team also did a control chart analysis to understand if there are any unusual data patterns occurring on a daily basis.
 
<br/><br/>
 
<br/><br/>
  
 
<b>Dashboards</b><br/>
 
<b>Dashboards</b><br/>
Two visual dashboards will be created for KST Bikers to visualize the analysis using softwares such as Tableau. The dashboards will provide a summary of the trends in the feedback data and the different external factors which generate these feedbacks. From there, our team will formulate insights and recommendations to KST Bikers.
+
Initially, our team proposed to have two dashboards using Tableau. One to provide a summary of the trends and the other to show the different external factors that generate feedbacks. With the change in objectives, one dashboard will be created for KST Bikers to visualize the analysis using R-Shiny. It will help KST Bikers to do some form of data cleaning when they upload the data, and provide an overview of the trends in the feedback data. KST Bikers would be then able to view the breakdown of feedback volume by group, category and time such as year, quarter or month. Our team will be using R-Shiny, an open source software, as it is able to build an interactive dashboard and no software installation will be needed. From there, our team will formulate insights and recommendations to KST Bikers.

Revision as of 23:25, 19 February 2017

Home

About Us

Project Overview

Findings

Project Management

Documentation

Other Group Projects

Description Methodology


Data

The dataset provided by KST Bikers is a Feedback System which consists of feedback lodged by:

  • SMS
  • Email
  • Feedback Form

TSK Transporters have also search for additional data source regarding public holiday as TSK Transporters maybe analysing how public holiday correlates with feedback volume. Listed below are three data sources corresponding to public holiday:

Tools Used

  • Microsoft Excel 2016
  • JMP Pro 13

Methodology

Discovery
Our team will first understand what KST Bikers is all about through their website, annual reports, social media platforms and by asking our sponsor. Secondly, we will identify potential additional data sources that will help with our analysis. Lastly, we will research to find out what are some techniques or ideas on how to analyse feedback data. The following are some research that we have done and our key findings of each article:

S/N Title of Article Summary of Key Findings
1 Top tips on how to analyse feedback

Having a comprehension of how to use present and future state process mapping and the advantages of using data boxes, plus a visual workflow diagram are going to be essential in the most of the cases and will increase value to your data analysis. This provides a clear visual help in seeing where the bottlenecks are in your processing and areas where you have to made the improvements.

Other methods include cause and effect diagrams, like the fishbone technique with the 5 whys, which enable you to identify your root causes and will introduce you to your path of resolving your key critical areas.

Data analysis in the form of a chart will bring up some important areas for discussion, revisit and future strategy.

2 What is EDA? Exploratory data analysis (EDA) is not just a collection of techniques. It is a philosophy as to how we breakdown a data set; what to look out for; how we look; and how to interpret. Most EDA techniques are graphical with little quantitative techniques. There is heavy reliance on graphics as the main role of EDA is to open-mindedly explore.
3 Why You’re Not Getting Value from Your Data Science Business users keep coming up with problems and data analysts cannot keep up as they take much time build sophisticated data models. The most common problem is that data scientists often do not build their work around the final objective which is to derive business value. The following are the best practices:
  • Stick with simple models
  • Explore more business problems: Instead of exploring one business problem with a sophisticated business models. Build a simple model for each problem and assess the value proposition
  • Learn from a sample of data – not all the data
  • Focus on automation: Use algorithms and develop software systems to automate data processing techniques

Data Collection
The dataset is from KST Bikers’s internal database which is collected from a variety of sources such as email, SMS, mobile application, online feedback and call centre. Our team will also be using additional datasets such as weather and public holiday data. Having such data allows us to examine external factors which could impact the generation of feedbacks.

Data Cleaning
The dataset is from KST Biker's internal EFMS database which is collected from a variety of sources such as email, SMS, mobile application, online feedback and call centre. Our team had also included an additional dataset on public holiday data to aide us in our analysis.

Data Cleaning and Transformation
For this project, our team is conducting descriptive analysis and thus, there is not a need to remove any missing values, outliers or conduct any data normalization. However, a missing data pattern analysis will be done to find out if there are any missing values that could be filled up to aide our analysis. In addition, there is a need to ensure that the data for each variable is consistent and in a readable format.

Data Exploration
Our team first looked into the summary statistics of each variable to get an overview of the dataset. From there, we spot missing values and select key variables for analysis. We will then identify trends based on the top 10 categories of feedback. This will allow us to focus on the top few most important issues that Singaporeans faced. Furthermore, the team also did a control chart analysis to understand if there are any unusual data patterns occurring on a daily basis.

Dashboards
Initially, our team proposed to have two dashboards using Tableau. One to provide a summary of the trends and the other to show the different external factors that generate feedbacks. With the change in objectives, one dashboard will be created for KST Bikers to visualize the analysis using R-Shiny. It will help KST Bikers to do some form of data cleaning when they upload the data, and provide an overview of the trends in the feedback data. KST Bikers would be then able to view the breakdown of feedback volume by group, category and time such as year, quarter or month. Our team will be using R-Shiny, an open source software, as it is able to build an interactive dashboard and no software installation will be needed. From there, our team will formulate insights and recommendations to KST Bikers.