Difference between revisions of "IS428 AY2019-20T1 Assign Wei Ming DataTransformationAnalysis"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
 
(13 intermediate revisions by the same user not shown)
Line 21: Line 21:
 
| style="padding:0.3em; font-size:100%; background-color:#3b8b68;  text-align:center; color:#928456" width="10%" |  
 
| style="padding:0.3em; font-size:100%; background-color:#3b8b68;  text-align:center; color:#928456" width="10%" |  
 
[[IS428_AY2019-20T1_Assign_Wei_Ming_Questions |
 
[[IS428_AY2019-20T1_Assign_Wei_Ming_Questions |
<font color="#ffffff" size=2><b>Task & Questions</b></font>]]
+
<font color="#ffffff" size=2><b>Tasks & Questions</b></font>]]
  
 
|}
 
|}
  
== Preliminary Data Analysis ==
+
= Preliminary Data Analysis =
'''<big>Geographic Information</big>''' <br>
+
== Geographic Information</big> ==
[[File:LPXASGMAP.jpg|thumb|400px|center]]
+
[[File:LPXASGMAP.jpg|thumb|center|400px|Figure1.1 St. Himark Labeled Map]]
 
<p>St. Himark is subdivided into '''19''' neighborhoods. Infrastructure provided includes sewer & water, roads & bridges, gas, garbage, healthcare and power, of which 72% is provided by the Always Safe Nuclear Power Plant located at SAFE TOWN (Neighborhood 4).</p>
 
<p>St. Himark is subdivided into '''19''' neighborhoods. Infrastructure provided includes sewer & water, roads & bridges, gas, garbage, healthcare and power, of which 72% is provided by the Always Safe Nuclear Power Plant located at SAFE TOWN (Neighborhood 4).</p>
 
<p>St. Himark has an exceptional network of '''8''' hospitals, located at PALACE HILLS (Neighborhood 1), OLD TOWN (Neighborhood 3), SOUTHWEST (Neighborhood 5), 2 of them at DOWNTOWN (Neighborhood 6), BROADVIEW (Neighborhood 9), TERRAPIN SPRINGS (Neighborhood 11), SOUTHTON (Neighborhood 16).</p>
 
<p>St. Himark has an exceptional network of '''8''' hospitals, located at PALACE HILLS (Neighborhood 1), OLD TOWN (Neighborhood 3), SOUTHWEST (Neighborhood 5), 2 of them at DOWNTOWN (Neighborhood 6), BROADVIEW (Neighborhood 9), TERRAPIN SPRINGS (Neighborhood 11), SOUTHTON (Neighborhood 16).</p>
  
'''<big>Report Data</big>''' <br>
+
== Report Data</big> ==
 
'''Data Description'''
 
'''Data Description'''
 
<p>The data for MC1 includes one (CSV) file spanning the entire length of the event, containing (categorical) individual reports of shaking/damage by neighborhood over time.  Reports are made by citizens at any time, however, they are only recorded in 5-minute batches/increments due to the server configuration.  Furthermore, delays in the receipt of reports may occur during power outages.</p>
 
<p>The data for MC1 includes one (CSV) file spanning the entire length of the event, containing (categorical) individual reports of shaking/damage by neighborhood over time.  Reports are made by citizens at any time, however, they are only recorded in 5-minute batches/increments due to the server configuration.  Furthermore, delays in the receipt of reports may occur during power outages.</p>
Line 42: Line 42:
 
<p>There are '''83070''' report records in this file, covering five days from April 6 to April 10. For every point-in-time at each location, there can be multiple records, one record or zero record.</p>
 
<p>There are '''83070''' report records in this file, covering five days from April 6 to April 10. For every point-in-time at each location, there can be multiple records, one record or zero record.</p>
 
<p>1. Number of Records by Location</p>
 
<p>1. Number of Records by Location</p>
[[File:Records by location.png|thumb|center|500px]]
+
[[File:Records by location.png|thumb|center|600px|Figure1.2 Number of Records by Location]]
 
<p>'''Finding 1''': According to the stacked bar chart, Location 3 (OLD TOWN), 8 (SCENIC VISTA) and 9 (BROADVIEW) received most reports, while Location 7 (WILSON FOREST) received very few reports. Referring to the shake map, location 3 is one of the most affected areas. And location 7 is actually a developing area where not many people live.</p>
 
<p>'''Finding 1''': According to the stacked bar chart, Location 3 (OLD TOWN), 8 (SCENIC VISTA) and 9 (BROADVIEW) received most reports, while Location 7 (WILSON FOREST) received very few reports. Referring to the shake map, location 3 is one of the most affected areas. And location 7 is actually a developing area where not many people live.</p>
 
<p>'''Finding 2''': Breaking down to each damage category, the proportion of all the damages for each location is pretty even except "medical damage", which is the red part in the chart. At location 2, 4, 7, 8, 10, 12, 13, 14, 15, 17, 18, 19, there are very few report about medical damage. This is because there are no hospitals in those areas.</p>
 
<p>'''Finding 2''': Breaking down to each damage category, the proportion of all the damages for each location is pretty even except "medical damage", which is the red part in the chart. At location 2, 4, 7, 8, 10, 12, 13, 14, 15, 17, 18, 19, there are very few report about medical damage. This is because there are no hospitals in those areas.</p>
 +
 +
<p>2. Number of Records by Time</p>
 +
[[File:Records by time.png|thumb|center|600px|Figure1.3 Number of Records by Time]]
 +
<p>'''Finding 1''': There are very few records from April 6 to around 7am in April 8, but a sharp increase after 7am. We can interpret that there may be a major shake in April 8.</p>
 +
<p>'''Finding 2''': There are several peaks in April 9 and 10, indicating there might be several aftershocks in these two days.</p>
 +
 +
 +
= Data Preparation =
 +
== Report Data Transformation ==
 +
The original dataset is not tidy enough for visualization. Therefore, we need to tidy up it first so that every record is an observation and every field is a variable. <br>
 +
'''Pivot data using Tableau Prep''' <br>
 +
<p>In the original dataset, every type of damage is in one field. So we need to merge them into one field, named as “damage”, and map their values to another field “severity”.</p>
 +
<p>''Issue'': Should I define “shake_intensity” as one type of damage here? </p>
 +
''Trade-off'': <br>
 +
<p>Even if “shake_intensity” doesn’t mean a kind of infrastructure damage, if “shake_intensity” is separated as one field, there is going to be more redundant data since one “shake_intensity” needs to be mapped to multiple records in the tidy version of data. And also, if “shake_intensity” is missing for one report (in original data), then there will be more missing data in the tidy version as well.</p>
 +
<p>Conclusion: To put “shake_intensity” under damage </p>
 +
[[File:Tableau prep.png|thumb|center|800px|Figure2.1 Tableau Prep Pivot]]
 +
[[File:Data before and after.png|thumb|center|800px|Figure2.2 Data Before and After]]
 +
 +
== Map Setup ==
 +
<p>Now that the report data is ready for use, we need to implement map to the visualization as a base. In order to arrange neighborhood shapes into the blank background map, we need to break down into the details of the shapes and arrange the position of the border points to the background. To achieve that, two generated files is needed:</p>
 +
[[File:Map details.png|thumb|center|800px|Figure2.3 Shape Details]]
 +
<p>In Tableau, we can import these two sets of data, join them by "StHimark_ID", set "Marks" as "Polygon", put "Point Order" into "Path", and "StHimark_ID" and "Sub Polygon_Id" into "Detail", finally put "longitude" and "latitude" in, then we can get a map like this: </p>
 +
[[File:Main Map.png|thumb|center|800px|Figure2.4 Map Setup]]

Latest revision as of 17:49, 13 October 2019

MC1-2019.jpg Mini-Challenge 1: Crowdsourcing for Situational Awareness

 

Problem & Motivation

Data Transformation & Analysis

 

Interactive Visualization

 

Tasks & Questions

Preliminary Data Analysis

Geographic Information

Figure1.1 St. Himark Labeled Map

St. Himark is subdivided into 19 neighborhoods. Infrastructure provided includes sewer & water, roads & bridges, gas, garbage, healthcare and power, of which 72% is provided by the Always Safe Nuclear Power Plant located at SAFE TOWN (Neighborhood 4).

St. Himark has an exceptional network of 8 hospitals, located at PALACE HILLS (Neighborhood 1), OLD TOWN (Neighborhood 3), SOUTHWEST (Neighborhood 5), 2 of them at DOWNTOWN (Neighborhood 6), BROADVIEW (Neighborhood 9), TERRAPIN SPRINGS (Neighborhood 11), SOUTHTON (Neighborhood 16).

Report Data

Data Description

The data for MC1 includes one (CSV) file spanning the entire length of the event, containing (categorical) individual reports of shaking/damage by neighborhood over time. Reports are made by citizens at any time, however, they are only recorded in 5-minute batches/increments due to the server configuration. Furthermore, delays in the receipt of reports may occur during power outages.

mc1-reports-data.csv fields:
- time: timestamp of incoming report/record, in the format YYYY-MM-DD hh:mm:ss
- location: id of neighborhood where person reporting is feeling the shaking and/or seeing the damage
- {shake_intensity, sewer_and_water, power, roads_and_bridges, medical, buildings}: reported categorical value of how violent the shaking was/how bad the damage was (0 - lowest, 10 - highest; missing data allowed)

Exploratory Data Analysis

There are 83070 report records in this file, covering five days from April 6 to April 10. For every point-in-time at each location, there can be multiple records, one record or zero record.

1. Number of Records by Location

Figure1.2 Number of Records by Location

Finding 1: According to the stacked bar chart, Location 3 (OLD TOWN), 8 (SCENIC VISTA) and 9 (BROADVIEW) received most reports, while Location 7 (WILSON FOREST) received very few reports. Referring to the shake map, location 3 is one of the most affected areas. And location 7 is actually a developing area where not many people live.

Finding 2: Breaking down to each damage category, the proportion of all the damages for each location is pretty even except "medical damage", which is the red part in the chart. At location 2, 4, 7, 8, 10, 12, 13, 14, 15, 17, 18, 19, there are very few report about medical damage. This is because there are no hospitals in those areas.

2. Number of Records by Time

Figure1.3 Number of Records by Time

Finding 1: There are very few records from April 6 to around 7am in April 8, but a sharp increase after 7am. We can interpret that there may be a major shake in April 8.

Finding 2: There are several peaks in April 9 and 10, indicating there might be several aftershocks in these two days.


Data Preparation

Report Data Transformation

The original dataset is not tidy enough for visualization. Therefore, we need to tidy up it first so that every record is an observation and every field is a variable.
Pivot data using Tableau Prep

In the original dataset, every type of damage is in one field. So we need to merge them into one field, named as “damage”, and map their values to another field “severity”.

Issue: Should I define “shake_intensity” as one type of damage here?

Trade-off:

Even if “shake_intensity” doesn’t mean a kind of infrastructure damage, if “shake_intensity” is separated as one field, there is going to be more redundant data since one “shake_intensity” needs to be mapped to multiple records in the tidy version of data. And also, if “shake_intensity” is missing for one report (in original data), then there will be more missing data in the tidy version as well.

Conclusion: To put “shake_intensity” under damage

Figure2.1 Tableau Prep Pivot
Figure2.2 Data Before and After

Map Setup

Now that the report data is ready for use, we need to implement map to the visualization as a base. In order to arrange neighborhood shapes into the blank background map, we need to break down into the details of the shapes and arrange the position of the border points to the background. To achieve that, two generated files is needed:

Figure2.3 Shape Details

In Tableau, we can import these two sets of data, join them by "StHimark_ID", set "Marks" as "Polygon", put "Point Order" into "Path", and "StHimark_ID" and "Sub Polygon_Id" into "Detail", finally put "longitude" and "latitude" in, then we can get a map like this:

Figure2.4 Map Setup