Difference between revisions of "IS428 2016-17 Term1 Assign3 Ong Ming Hao"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
Line 36: Line 36:
 
[[File: Michael_Dataclean1.jpg|center|600px]]
 
[[File: Michael_Dataclean1.jpg|center|600px]]
  
 +
==Effectiveness of the different tools for Data Cleaning==
 +
===First Data Clean===
 +
<b>Microsoft Word</b>
 +
<b>JMP</b>
 +
<b>OpenRefine</b>
 +
<b>Summary and Findings</b>
  
 
+
===Second Data Clean===
==Data Cleaning==
+
<b>Microsoft Word</b>
 
+
<b>JMP</b>
==Tools Used==
+
<b>OpenRefine</b>
 +
<b>Summary and Findings</b>
  
 
==Question 1==
 
==Question 1==

Revision as of 03:02, 22 October 2016

Understanding the problem

Introduction

Before I embark on this analysis. I’ve decided on properly understand the problem and plan on how I intend to visualise these data. I came a few steps in which I am able to better understand the problem. These are the steps:

  1. Understanding the problem
  2. Select appropriate Data Sources
  3. Select the appropriate visualisation tool
  4. Layout of the visualisation

By doing so, I was able gather better insights and visualisations since I am able to identify the root cause of the problem. The goal of this analysis is for me to learn various visual analytics tools to evaluate the effectiveness of such tools. In addition, I would also like to try out various data cleaning software in order to also evaluate their effectiveness.

Tools Used

For this assignment, I plan to use the following tools to analyse, identify patterns and gather insights from GAStech’s Abila. I’ve split it up into 2 main categories – Visual Analytics Tools and Data Cleaning Software. Visual Analytics Tool

  • Tablueau 10.0.0
  • Power BI

Data Cleaning Software

  • Microsoft Excel
  • JMP
  • OpenRefine (Formally known as Google Refine)

Understanding Question 1

What are the typical patterns in the prox card data? What does a typical day look like for GAStech employees? To answer this question, we must look at the data which was provided to us. I’ve found out that we must use the following files so that we are able to gather an in-depth understanding of the data.

  • Vast Prov Zone F1, F2, F3 – To understand where the various zones are.
  • Employees List – To understand the GAStech employees, their job roles, etc.
  • Data Format – To understand the data even greater detail
  • ProxOut-MC2.csv – Raw Data of movement by X and Y Coordinates
  • ProxMobileOut-MC2.csv – Raw Data of movement by zones

Upon initial inspection of the data, I realised some things. Firstly, in “proxOut-MC2.csv” and “proxMobileOut-MC2.csv”, we had to link the prox-id to the employee in the “Employee List.xlsx”. Next, we had to take note of the various employees with last names. Lastly, each prox-id has numbers trailing at the end. After much considerations and investigation, I assumed that these numbers can be removed without any implications. Which resulted in the following.

Michael Dataclean1.jpg

Effectiveness of the different tools for Data Cleaning

First Data Clean

Microsoft Word JMP OpenRefine Summary and Findings

Second Data Clean

Microsoft Word JMP OpenRefine Summary and Findings

Question 1

Question 2

Question 3

Question 4

conclusion