ANLY482 AY2016-17 T2 Group12 : Project Overview / Methodology

From Analytics Practicum
Revision as of 20:30, 15 January 2017 by Tingzhi.lim.2013 (talk | contribs)
Jump to navigation Jump to search

Home

About Us

Project Overview

Findings

Project Management

Documentation

Other Group Projects

Description Methodology


Data

The dataset provided by KST Bikers is a Feedback System which consists of feedback lodged by:

  • SMS
  • Email
  • Feedback Form

Tools Used

  • Microsoft Excel 2016
  • JMP Pro 13
  • Tableau 10.0

Methodology

Our team will first understand what KST Bikers is all about through their website, annual reports, social media platforms and by asking our sponsor. Secondly, we will identify potential additional data sources that will help with our analysis. Lastly, we will research to find out what are some techniques or ideas on how to analyse feedback data. The following are some research that we have done and our key findings of each article:

S/N Title of Article Summary of Key Findings
1 Top tips on how to analyse feedback

Having a comprehension of how to use present and future state process mapping and the advantages of using data boxes, plus a visual workflow diagram are going to be essential in the most of the cases and will increase value to your data analysis. This provides a clear visual help in seeing where the bottlenecks are in your processing and areas where you have to made the improvements.

Other methods include cause and effect diagrams, like the fishbone technique with the 5 whys, which enable you to identify your root causes and will introduce you to your path of resolving your key critical areas.

Data analysis in the form of a chart will bring up some important areas for discussion, revisit and future strategy.

2 What is EDA? Exploratory data analysis (EDA) is not just a collection of techniques. It is a philosophy as to how we breakdown a data set; what to look out for; how we look; and how to interpret. Most EDA techniques are graphical with little quantitative techniques. There is heavy reliance on graphics as the main role of EDA is to open-mindedly explore.
3 Why You’re Not Getting Value from Your Data Science Business users keep coming up with problems and data analysts cannot keep up as they take much time build sophisticated data models. The most common problem is that data scientists often do not build their work around the final objective which is to derive business value. The following are the best practices:
  • Stick with simple models
  • Explore more business problems: Instead of exploring one business problem with a sophisticated business models. Build a simple model for each problem and assess the value proposition
  • Learn from a sample of data – not all the data
  • Focus on automation: Use algorithms and develop software systems to automate data processing techniques

Data Collection
The dataset is from KST Bikers’s internal database which is collected from a variety of sources such as email, SMS, mobile application, online feedback and call centre. Our team will also be using additional datasets such as weather and public holiday data. Having such data allows us to examine external factors which could impact the generation of feedbacks.

Data Cleaning
Outliers and missing values cause data inaccuracy. Hence, our team will remove missing values and outliers. However, if there are too many outliers, they will be treated as a separate group for analysis. In addition, data cleaning also includes ensuring the data for each variable is consistent in its format. For example, the variable, Location, consists of street names, geolocation coordinates and location of the feedback such as pavement, railings and lifts. Thus, it will affect how our team will analyse or process the data.

Data Normalization and Transformation
As the variables in the dataset have different forms of measurements, normalization could be conducted to provide equal weightage to each variable. Z-score normalization will be used. If the distribution of the variables is found to be skewed, natural log will be conducted to each involved variable to make the model more normally distributed.

Data Exploration
Our team will first look into the summary statistics of each variable to get an overview of the dataset. From there, we will spot missing values, identify outliers and select necessary variables such as categories and subcategories for analysis. We will then identify trends such as which categories or subcategories has the highest feedbacks. This will allow us to figure out which are the top few most important problems that Singaporeans faced. In addition, with the additional data sources, we will examine how do the external factors impact the generation of the feedbacks.

Dashboards
Two visual dashboards will be created for KST Bikers to visualize the analysis using softwares such as Tableau. The dashboards will provide a summary of the trends in the feedback data and the different external factors which generate these feedbacks. From there, our team will formulate insights and recommendations to KST Bikers.