VAST Challenge 2019: Mini-Challenge 1

Datasets

Before we start with the data analysis, there is a need to understand and read the raw data. For this VAST Challenge 2019: Mini-Challenge 1, I am given 1 dataset which is:

mc1-reports-data.csv [Data columns:Time | Sewer & Water | Power | Roads & Bridges | Medical | Buildings | Shake intensity | Location ]

For this Challenge, I will be leveraging on the custom polygon maps for St. Himark in order to create map visualizations with the 2 datasets listed below:

StHimark_Features.csv [Data columns:Id | Nbrhood | StHimark_ID | Latitude | Longitude ]
StHimark_Points.csv [Data columns:StHimark_ID | sub_polygon_id | point_order | Latitude | Longitude ]

Exploratory Data Analysis

Moving on to the data analysis, I will be performing Exploratory Data Analysis using Tableau Prep Builder. There are some issues with the original dataset listed below:

Columns with Null Values

Sewer & Water: 171 counts of Null values
Medical: 47441 counts of Null values
Building: 170 counts of Null values
Shake Intensity: 12144 counts of Null values

We cannot simply replace the Null values as columns like “Medical” and “Shake Intensity” contains a significant number of records with Null value to replace. If chosen to replace with mean or median, it will drastically change the variation of the distribution. Hence, the decision is to remove these values. Removed values will be in a form of records with the sole category and the rating itself, as pivoting will be done during the dataset transformation.

Dataset Transformation

Description	Reference
Overview	Figure 0.1
Step 1: Creation of Calculated Field Using “location” as an ID to create a calculated field (Refer to Figure 0.2) of “Location Name” based on the word document reference for the background of St. Himark, VAST 2019 - St. Himark - About Our City.docx.	Figure 0.2
Step 2: Pivot and Clean Pivot 6 columns: Sewer & Water, Power, Roads & Bridges, Medical, Buildings & Shake intensity. Cleaning of data source: Proper Renaming of header row (Refer to Figure 0.3) and Removal of Null Values (Refer to Figure 0.4). Pivot category -> Category Pivot value -> Rating Result in Number of rows: 83072 -> 438494	Figure 0.3 Figure 0.4
Step 3: Join and Clean Part 1 Joining with mc1-reports-data.csv & StHimark_Features.csv with these 2 identifiers (Refer to Figure 0.5) respectively: “location” & “id” to populate the data column, “StHimark_ID”. And Removal of unwanted columns: Latitude & Longitude (These points are not accurate and does not create the shape of the map, hence the decision to remove). Final product: MC1 Report.hyper (Refer to Figure 0.6)	Figure 0.5 Figure 0.6
Step 4: Join and Clean Part 2 Joining with StHimark_Features.csv & StHimark_Points.csv with these 2 identifiers (Refer to Figure 0.7) respectively: “StHimark_ID” & “StHimark_ID”. And Removal of unwated columns: Latitude (StHimark_Features.csv) & Longitude (StHimark_Features.csv), Nbrhood, sub_polygon_id and StHimark_ID (StHimark_Features.csv). Final product: MC1 Map.hyper (Refer to Figure 0.8)	Figure 0.7 Figure 0.8

IS428 AY2019-20T1 Assign Harvey Kristanto Lauw: Dataset

Exploratory Data Analysis

Dataset Transformation

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools