IS428 AY2019-20T1 Assign Wong Kuan Wai Gordon

From Visual Analytics for Business Intelligence
Revision as of 21:33, 12 October 2019 by Gordon.wong.2016 (talk | contribs)
Jump to navigation Jump to search
Visual Detective - Optimizing Emergency Response Through Crowdsourcing & Community Engagement


Overview

St. Himark has been hit by an earthquake, leaving officials scrambling to determine the extent of the damage and dispatch limited resources to the areas in most need. They quickly receive seismic readings and use those for an initial deployment but realize they need more information to make sure they have a realistic understanding of the true conditions throughout the city.

In a prescient move of community engagement, the city had released a new damage reporting mobile application shortly before the earthquake. This app allows citizens to provide more timely information to the city to help them understand damage and prioritize their response. In this mini-challenge, use app responses in conjunction with shake maps of the earthquake strength to identify areas of concern and advise emergency planners. Note: the shake maps are from April 6 and April 8 respectively. With emergency services stretched thin, officials are relying on citizens to provide them with much needed information about the effects of the quake to help focus recovery efforts. By combining seismic readings of the quake, responses from the app, and background knowledge of the city, help the city triage their efforts for rescue and recovery.

Objective

My task as a visual analytics expert is to analyze and provide interactive visualizations through seismic readings of the earthquake, damage reports from the mobile application and background knowledge of the city. These visualizations will enable citizens and emergency planners with situational awareness of the disaster. Ultimately, this will further optimize the response of emergency services by enabling them to dispatch their limited resources to the right place at the right time with the right emergency tools, prioritizing their efforts for rescue and recovery. Therefore, my task as a visual analytics expert will explore the following:

1) Geometric damage patterns of St. Himark and its neighborhood
2) Earthquake scale and classes based on seismic readings
3) Sending the correct emergency response based on infrastructure damages
4) Damage inflicted to St. Himark infrastructures in each neighborhood
5) Uncertainty of damage reports

Task #1: Emergency responders will base their initial response on the earthquake shake map. Use visual analytics to determine how their response should change based on damage reports from citizens on the ground. How would you prioritize neighborhoods for response? Which parts of the city are hardest hit?

Task #2: Use visual analytics to show uncertainty in the data. Compare the reliability of neighborhood reports. Which neighborhoods are providing reliable reports? Provide a rationale for your response.

Task #3: How do conditions change over time? How does uncertainty in data change over time? Describe the key changes you see.


About the Data

The data for Mini-Challenge 1 includes one (CSV) file spanning the entire length of the event, containing (categorical) individual reports of shaking/damage by neighborhood over time. Reports are made by citizens at any time, however, they are only recorded in 5-minute batches/increments due to the server configuration. The fields of the data and its relevance are as follows:

1) time: timestamp of incoming report/record, in the format YYYY-MM-DD hh:mm:ss
2) sewer_and_water: Value of how bad the damage was to St. Himark's sewer and water infrastructure (0 - lowest, 10 - highest; missing data allowed)
3) power: Value of how bad the damage was to St. Himark's power infrastructure (0 - lowest, 10 - highest; missing data allowed)
4) roads_and_bridges: Value of how bad the damage was to St. Himark's roads and bridges infrastructure (0 - lowest, 10 - highest; missing data allowed)
5) medical: Value of how bad the damage was to St. Himark's medical infrastructure (0 - lowest, 10 - highest; missing data allowed)
6) buildings: Value of how bad the damage was to St. Himark's buildings infrastructure (0 - lowest, 10 - highest; missing data allowed)
7) shake_intensity: Value of how violent the shaking was (0 - lowest, 10 - highest; missing data allowed)
8) location: ID of neighborhood where person reporting is feeling the shaking and/or seeing the damage

The Shapefile "StHimark.shp" and image "StHimarkMapBlank.png" was taken from Mini-Challenge 2 to be used to map and display the St. Himark map

Data Preparation

Tools Used:

Tableau icon.jpg
Microsoft Excel 2013 logo.svg .png
Macros-vba-512.png
Tableau Prep Builder
Microsoft Excel
Microsoft Excel VBA

Data Preparation for St. Himark Map

The aim for this step is to be able to display the background overlay "StHimarkBlankMap.png" along with the given the Shapefile "StHimark.shp" from Mini-Challenge 2.
However, performing these steps alone produced the following undesired results:

Small.png

With reference to Kelvin Chia's discussion in the class forum and research in the Tableau community, this undesired result was produced as Tableau is unable to identify and determine how the polygons are scaled by the Geometry which then defaults the polygons to be small in size and placed as centroids.

Therefore, a solution to this problem is to create our own custom filled map. This is done through research and an online tool by the Tableau community. As a results, 2 CSV files were generated: StHimark_Features.csv and StHimark_Points.csv.

StHimark_Features.csv will contain a row for each row in the shape file with all the feature details (including the calculated coordinates of the center of the shapes), StHimark_Points.csv will contain a row per boundary point, structured as needed to draw polygons in Tableau. Used like that, with no optional parameters, both files will include a generated unique ID column, which can be used to join the two files, so classifying details from the shapefiles can be used as dimensions in the Tableau workbook. The sample screenshot for StHimark_Features.csv and StHimark_Points.csv are illustrated respectively:

SthimarkFeatures.png
SthimarkPoint.png
StHimark_Features.csv
StHimark_Points.csv

We then perform an inner join between StHimark_Features.csv and StHimark_Points.csv by StHimark ID and the desired result is achieved:

InnerjoinMap1.png
DesiredMap.png

Neighborhoods with No Reports at a Specified Date and Time

As I look into the raw data in "mc1-reports-data.csv", I realized that if we filter the damage by date (Eg. 06/04/2020 00:00:00), the map will only display neighborhoods that have received damage reports. Neighborhoods that DID NOT receive any damage reports during this time will not be displayed in the map. This is not what I want the visualization to look like as I want all neighborhoods to be displayed regardless of whether they received any damage reports or not. The undesired sample display of the map is as follows:

Map3.png

" Therefore the solution to this issue is to create dummy data. For example, at "06/04/2020 00:00:00" only locations 2, 3, 4, 8, 15, 17 and 18 received AT LEAST 1 damage report. As such, I will create and insert rows for locations 1, 5, 6, 7, 9, 10, 11, 12, 13, 14, 16 and 19. However, since there is technically no damage reports made for these locations, I will assign NULL values for all the infrastructure damages and shake intensity. Lastly, I assign a value of "-1" instead of a NULL value for the average damage. To perform this step, I utilized Microsoft Excel's Visual Basic for Application (VBA) to help me automate this process. To allow my program for continuous usage , I have created a new Excel workbook with the VBA that is able to upload the raw data "mc1-reports-data.csv" and perform the automation. My VBA source codes are as follows:

   Dim x As Long
   Dim dict As New Scripting.Dictionary
   Dim uniqueId As String
   Dim location As String
   Dim datetimeString As String
   Dim nextDatetimeString As String
   Dim NumRows As Long
   Dim lastRow As Long
   
   Application.ScreenUpdating = False
   
   'Set numrows = number of rows of data.
   NumRows = Range("A1", Range("A1").End(xlDown)).Rows.Count
   lastRow = NumRows
   
   'Select cell A2.
   Range("A2").Select
     
   'Establish "For" loop to loop "NumRows" number of times.
   'Using a dictionary
   'Loop through each row and add the location ID in to the dictionary
   'For loop to loop through 1 to 19 to check if dictionary contains the location IDs
   'If no, insert dummy data with datetime, locationId and "-1" average damage into last row
   For x = 2 To NumRows
       datetimeString = Format(Range("A" + CStr(x)).Value, "DD/MM/YYYY HH:MM:SS")
       location = Range("I" + CStr(x)).Value
       
       If x = 2 Then
           dict.Add Key:=location, Item:=location
       End If
       
       nextDatetimeString = Format(Range("A" + CStr(x + 1)).Value, "DD/MM/YYYY HH:MM:SS")
           
       If datetimeString = nextDatetimeString And dict.Exists(location) = False Then
           dict.Add Key:=location, Item:=location
       ElseIf datetimeString <> nextDatetimeString And dict.Exists(location) = False Then
           dict.Add Key:=location, Item:=location
           
           For y = 1 To 19
               If dict.Exists(CStr(y)) = False Then
                   lastRow = lastRow + 1
                   Range("A" + CStr(lastRow)).Value = datetimeString
                   Range("B" + CStr(lastRow)).Value = -1
                   Range("I" + CStr(lastRow)).Value = CStr(y)
               End If
           Next
           
           dict.RemoveAll
       ElseIf datetimeString <> nextDatetimeString And dict.Exists(location) = True Then
           
           For y = 1 To 19
               If dict.Exists(CStr(y)) = False Then
                   lastRow = lastRow + 1
                   Range("A" + CStr(lastRow)).Value = datetimeString
                   Range("B" + CStr(lastRow)).Value = -1
                   Range("I" + CStr(lastRow)).Value = CStr(y)
               End If
           Next
           dict.RemoveAll
       End If
   Next
   Application.ScreenUpdating = True

With that, I am now able to display neighborhoods which do not have any damage reports at the specified time. As illustration of the desired result is as follows:

Map41.png

Data Preparation for Damage Reports

This step aims to clean and transform the raw data in "mc1-reports-data.csv" and produce one with the following fields: 1) damageReportId: Unique ID of the damage report 2) time: Date and time the report was made (DD/MM/YYYY HH:MM:SS) 3) location: Location ID of the neighborhood 4) Nbrhood: Name of the neighborhood 5) Infrastructure: Any of the 5 types of infrastructure of St. Himark (Sewer and Water, Power, Medical, Roads and Bridges, Buildings) 6) Damage: Value of how bad the damage is (0 - lowest, 10 - highest; missing data allowed)

To achieve the desired data, I used Tableau Prep Builder to perform the cleaning and transformation. The illustration of the steps taken are as follows:

DataPrep1.png

In the first cleaning step, I removed the field "shake_intensity" as this is not a field that I want in this data. I then renamed the fields to be easily understood (Eg. "Sewer and Water" instead of "sewer_and_water". The next step is the pivoting of the fields; Sewer and Water, Power, Medical, Roads and Bridges and Buildings. This step is illustrated as follows:

DataPrep2.png

Next, as my current output only contains the neighborhood ID, I want the neighborhood Name to be specified in my data as well. To do this, I imported the StHimark_Features.csv file obtained from "Data Preparation for St. Himark Map". First, I remove the unwanted fields; StHimark_Id, latitude and longitude. Next, I then performed an inner join of "location" field with StHimark_Features.csv's "Id" field. This step is illustrated as follows:

DataPrep3.png
DataPrep4.png

My final output is as follows: (Note that I did not remove damage reports with NULL values for damage)

DataPrep5.png

Data Preparation for DateTime Formatting & Average Damage

This step aims to format the time field to the correct format and to create a new column from the raw data in "mc1-reports-data.csv" which computes the average damage dealt to the specified location at the specified time. To achieve the desired data, I used Tableau Prep Builder to perform the cleaning and transformation. The illustration of the steps taken are as follows:

DataPrep6.png

The calculated field for formatting the date and time are as follows:

DataPrep7.png

The calculated field for calculating the average damage are as follows:

DataPrep8.png

The sample screenshot of the desired final output of the data is as follows:

DataPrep9.png

Interactive Visualization

The interactive visualization can be accessed here:

Throughout all the different dashboards, useful guides/tips are provided to help users navigate through the different filters and actions so that their analysis can be performed smoothly. The detailed explanation of my dashboards are as follows:

Home Dashboard

The aim of the home dashboard is to provide viewers a quick overview of the problem by reading the title of the dashboard. The "Introduction" allows viewers to understand the scope and the problems that I will be exploring. On the right side of the dashboard, interactive buttons are placed to allow viewers to easily navigate to the different dashboards with just a click of a button.

DataPrep10.png

St. Himark Overview Dashboard

The aim of the St. Himark Overview dashboard is to allow viewers to explore the analysis behind the disaster throughout the entire period from 06 APR 2020, 00:00:00 to 11 APR 2020, 00:00:00. To zoom in on a specific date and time, a drop down bar is in place to perform that function as well. Guiding messages and tips are also placed around the dashboard for users to perform to assist in their visualization experience. The detailed explanation of each interactive function and charts are as follows: (This screenshot is reformatted to fit into .PNG file and hence, alignment may seem out of place. Visit my Tableau Public to assess the main dashboard)

Overview.png
Interactive Visualization Rationale Screenshot Example
Side Navigation Menu
The side navigation menu contains interactive buttons to allow viewers to easily navigate to the different dashboards with just a click of a button.
DataPrep11.png
Synchronized Date & Time Drop Down Menu
The synchronized date and time drop down menu enables viewers to select a specific date and time. Once selected, all the charts in the dashboard will be automatically rendered to display the correct data for visualization.
DataPrep12.png
St. Himark Damage Map
The St. Himark damage map displays the average damage for each and every neighborhood. Depending on the date and time selected in the drop down menu, the St. Himark map will automatically render to display the correct data for visualization. At a single glance, viewers can then be able to observe which neighborhood sustained the most/least/zero damage.

The legend at the bottom left of the map displays the color codes in which the most damage will be in the darkest tone. Additionally, viewers can hover over each neighborhood to determine the exact value of the average damage sustained by that specific neighborhood.

DataPrep15.png
Average Shake Intensity With Shake Status Synchronized Chart
The average shake intensity with shake status synchronized chart shows how violent the shake intensity is at any particular date and time. Similar to the Himark Damage Map, depending on the date and time selected in the drop down menu, the chart will automatically render to display the correct data for visualization.

Being a synchronized chart, there is also can additional function whereby viewers can click on any point of the chart and automatically render the other charts in the dashboard. This allows viewers to examine and analyse the relationships between shake intensity with the average damage on the St. Himark map as well as with the Damage Reports Certainty Status chart.

Additionally, viewers can hover over the chart at various points to determine the average shake intensity at the specified date and time. The Shake Status is also displayed in the tool tip which provides information of the different classes (Minor, Moderate, Major and Critical) based on the average shake intensity value.

DataPrep16.png
Damage Reports Certainty/Uncertainty Status Chart
The damage reports certainty/uncertainty status chart provides viewers an overview of the certainty of the reports at specific dates and time. In the top right of the chart, the legend is displayed whereby green color denotes that the damage reports were certain and red otherwise.

The certainty of the damage reports is determined by the number of reports for a specific date and time and compared against the measured shake intensity. The conditions to measure certainty are 1) If the number of reports is less than or equal to 2 for a particular neighborhood and the average shake intensity is less than or equal to 1, the damage reports are deemed "Uncertain". 2) If the number of reports is more than 2 for a particular neighborhood and the average shake intensity is more than 1, the damage reports are deemed "Certain".

Additionally, by selecting a specific date and time in the drop down menu or Average Shake Intensity with Shake Status chart, viewers will be able to determine the damage report certainty status for that specific date and time. Similarly, viewers can hover over the Damage Reports Certainty/Uncertainty Status chart to activate the tool tip containing the information.

DataPrep17.png

St. Himark Infrastructure Analysis Dashboard

The aim of the St. Himark Infrastructure Analysis Dashboard is to enable viewers to determine the extent of the damage dealt to the 5 different infrastructures in each neighborhood. Guiding messages and tips are also placed around the dashboard for users to perform to assist in their visualization experience. The detailed explanation of each interactive function and charts are as follows: (This screenshot is reformatted to fit into .PNG file and hence, alignment may seem out of place. Visit my Tableau Public to assess the main dashboard)

Infrastructure Analysis.png
Interactive Visualization Rationale Screenshot Example
Side Navigation Menu
The side navigation menu contains interactive buttons to allow viewers to easily navigate to the different dashboards with just a click of a button.
DataPrep11.png
Synchronized Selection Time Graph
The Synchronized Selection Time Graph enables viewers to select a specific date and time similar to that of a drop down menu. However, this graph was created with viewer interaction in mind. Viewers are able to hover over the Synchronized Selection Time Graph and the various charts in the dashboard will automatically render and display the correct data for visualization right upon hover. This also allows viewers with an immediate overview of the different charts without the need to click on a specific date and time. The x-axis of the graph represents the minutes while the y-axis represents the hours of the period from 06 APR 2020, 12AM to 11 APR 2020, 12AM. In other words, moving horizontally across the graph changes the minute while moving vertically across the graph changes the date and hour time.
DataPrep18.png
Infrastructure Damage Chart
The Infrastructure Damage Chart enables viewers to view, at a single glance, the damage dealt to each infrastructure (Buildings, Medical, Power, Roads and Bridges and Sewer and Water) for each neighborhood. At the top right of the chart, the legend specifies the color coding for the different values of damage, the higher the damage, the darker the tone. Additionally, the key function of this chart is the automatic sort whereby the neighborhoods are sorted in descending order based on the overall damage for every render of the date and time using the selection time graph.

For instance, in the screenshot example on the right, selected date and time is 08 APR 2020, 8.35AM and neighborhood "3" (Old Town) sustained the highest overall damage and its power infrastructure sustained the most damage. On the other hand, neighborhood "7" (Wilson Forest) sustained the least overall damage. Viewers can also hover each bar to view the tool tip which contains all the information, including the value of the damage.

DataPrep19.png
Reporting Trends Chart
The Reporting Trends Chart enables viewers to view, at a single glance, the damage reporting patterns for each infrastructure for each neighborhood. At the top right of the chart, the legend specifies the color coding for the different values of damage reports made, the higher the damage reports made, the darker the red tone. Green tone denotes a low or zero damage reports made.

With this visualization, we can then determine the relationships between the number of damage reports made with infrastructural damages as well as the relationships with shake intensity in the "St. Himark Overview" dashboard. Viewers can also hover each data to view the tool tip which contains all the information, including the number of reports made.

DataPrep20.png
DataPrep21.png

Interesting and Anomalous Observations

Task #1 Use visual analytics to determine how their response should change based on damage reports from citizens on the ground.

Emergency responders will base their initial response on the earthquake shake map. Use visual analytics to determine how their response should change based on damage reports from citizens on the ground. How would you prioritize neighborhoods for response? Which parts of the city are hardest hit?

Let us first take a look at the earthquake shake maps provided for Mini-Challenge 1. The earthquake shake maps for pre-quake and major quake are as follows:

Mc1-prequake-shakemap.png
Mc1-majorquake-shakemap.png
Pre-Quake Shake Map
Major-Quake Shake Map

Task #2 Use visual analytics to show uncertainty in the data.

Task #3 How do conditions change over time? How does uncertainty in data change over time?

References

Comments

Feel free to provide feedback!