IS428 AY2019-20T1 Assign Harvey Kristanto Lauw: Dataset

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search

Title.jpg VAST Challenge 2019: Mini-Challenge 1

Background

Dataset Analysis & Transformation

Interactive Visualization

Others

 


Datasets

Before we start with the data analysis, there is a need to understand and read the raw data. For this VAST Challenge 2019: Mini-Challenge 1, I am given 1 dataset which is:

  • mc1-reports-data.csv [Data columns:Time | Sewer & Water | Power | Roads & Bridges | Medical | Buildings | Shake intensity | Location ]

For this Challenge, I will be leveraging on the custom polygon maps for St. Himark in order to create map visualizations with the 2 datasets listed below:

  • StHimark_Features.csv [Data columns:Id | Nbrhood | StHimark_ID | Latitude | Longitude ]
  • StHimark_Points.csv [Data columns:StHimark_ID | sub_polygon_id | point_order | Latitude | Longitude ]

Exploratory Data Analysis

Moving on to the data analysis, I will be performing Exploratory Data Analysis using Tableau Prep Builder. There are some issues with the original dataset listed below:

Columns with Null Values

  • Sewer & Water: 171 counts of Null values
  • Medical: 47441 counts of Null values
  • Building: 170 counts of Null values
  • Shake Intensity: 12144 counts of Null values

We cannot simply replace the Null values as columns like “Medical” and “Shake Intensity” contains a significant number of records with Null value to replace. If chosen to replace with mean or median, it will drastically change the variation of the distribution. Hence, the decision is to remove these values. Removed values will be in a form of records with the sole category and the rating itself, as pivoting will be done during the dataset transformation.

Dataset Transformation


Description Reference

Overview

Dataset Transformation Overview.jpg
Figure 0.1

Step 1: Creation of Calculated Field

Using “location” as an ID to create a calculated field (Refer to Figure 0.2) of “Location Name” based on the word document reference for the background of St. Himark, VAST 2019 - St. Himark - About Our City.docx.

Dataset step 1.1.jpg
Figure 0.2

Step 2: Pivot and Clean

Pivot 6 columns: Sewer & Water, Power, Roads & Bridges, Medical, Buildings & Shake intensity. Cleaning of data source: Proper Renaming of header row (Refer to Figure 0.3) and Removal of Null Values (Refer to Figure 0.4).

  • Pivot category -> Category
  • Pivot value -> Rating

Result in Number of rows: 83072 -> 438494

Dataset step 2.1.jpg
Figure 0.3

Dataset step 2.2.jpg
Figure 0.4

Step 3: Join and Clean Part 1

Joining with mc1-reports-data.csv & StHimark_Features.csv with these 2 identifiers (Refer to Figure 0.5) respectively: “location” & “id” to populate the data column, “StHimark_ID”. And Removal of unwanted columns: Latitude & Longitude (These points are not accurate and does not create the shape of the map, hence the decision to remove).

  • Final product: MC1 Report.hyper (Refer to Figure 0.6)

Dataset step 3.1.jpg
Figure 0.5

Dataset step 3.2.jpg
Figure 0.6

Step 4: Join and Clean Part 2

Joining with StHimark_Features.csv & StHimark_Points.csv with these 2 identifiers (Refer to Figure 0.7) respectively: “StHimark_ID” & “StHimark_ID”. And Removal of unwated columns: Latitude (StHimark_Features.csv) & Longitude (StHimark_Features.csv), Nbrhood, sub_polygon_id and StHimark_ID (StHimark_Features.csv).

  • Final product: MC1 Map.hyper (Refer to Figure 0.8)

Dataset step 4.1.jpg
Figure 0.7

Dataset step 4.2.jpg
Figure 0.8