Difference between revisions of "IS428 AY2019-20T1 Assign Abhyuday Samadder"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
(Blanked the page)
Tag: Blanking
Line 1: Line 1:
 +
== Problem and Motivation ==
 +
<p>St. Himark has been hit by an earthquake, leaving officials scrambling to determine the extent of the damage and dispatch limited resources to the areas in most need. They quickly receive seismic readings and use those for an initial deployment but realize they need more information to make sure they have a realistic understanding of the true conditions throughout the city.
  
 +
<p>In a prescient move of community engagement, the city had released a new damage reporting mobile application shortly before the earthquake. This app allows citizens to provide more timely information to the city to help them understand damage and prioritize their response. In this mini-challenge, use app responses in conjunction with shake maps of the earthquake strength to identify areas of concern and advise emergency planners. Note: the shake maps are from April 6 and April 8 respectively.
 +
 +
<p>With emergency services stretched thin, officials are relying on citizens to provide them with much needed information about the effects of the quake to help focus recovery efforts.
 +
 +
<p>By combining seismic readings of the quake, responses from the app, and background knowledge of the city, help the
 +
city triage their efforts for
 +
[[thumb]]
 +
rescue and recovery.
 +
 +
<p>Tasks and Questions:
 +
# Emergency responders will base their initial response on the earthquake shake map. Use visual analytics to determine how their response should change based on damage reports from citizens on the ground. How would you prioritize neighborhoods for response? Which parts of the city are hardest hit? Limit your response to 1000 words and 10 images.
 +
# Use visual analytics to show uncertainty in the data. Compare the reliability of neighborhood reports. Which neighborhoods are providing reliable reports? Provide a rationale for your response. Limit your response to 1000 words and 10 images.
 +
# How do conditions change over time? How does uncertainty in change over time? Describe the key changes you see. Limit your response to 500 words and 8 images.
 +
 +
== Transforming and analysing the dataset ==
 +
<p>The first thing that needs to be done is to analyse the given dataset which is a single CSV file with the following fields:
 +
* time: timestamp of incoming report/record, in the format YYYY-MM-DD hh:mm:ss
 +
* location: id of neighborhood where person reporting is feeling the shaking and/or seeing the damage
 +
* {shake_intensity, sewer_and_water, power, roads_and_bridges, medical, buildings}: reported categorical value of how violent the shaking was/how bad the damage was (0 - lowest, 10 - highest; missing data allowed)
 +
 +
Since the provided data is not in a suitable form for visual Analysis on Tableau, I have used Tableau Prep Builder to pre-process the data.
 +
 +
=== Data Cleaning ===
 +
{| class="wikitable"
 +
|-
 +
! Problem 1 || Too many columns in the dataset.
 +
|-
 +
| Solution: || The different sectors, i.e., sewer_and_water, power, roads_and_bridges, medical, buildings are each column headers. For easier analysis we can pivot these columns and name the newly formed columns, Sector and Sector Damage. This can be seen in the screenshot below.
 +
|-
 +
| Image || [[File:Image1.png|800px|center]]
 +
|}
 +
 +
{| class="wikitable"
 +
|-
 +
! Problem 2 || The shake intensity is represented as numbers.
 +
|-
 +
| Solution || Binning the shale intensity as represented in the following picture using a calculated field.
 +
|-
 +
| Image || [[File:Image2.png|400px|center]] [[File:Image3.png|800px|center]]
 +
|}
 +
 +
{| class="wikitable"
 +
|-
 +
! Problem 3 || Unable to fit the image polygons on the map from MC2
 +
|-
 +
| Solution || I followed the steps provided my one of our classmates on the VA discussion forum to fit the map and the shape file together. After downloading the STHimark_Points.csv and STHimark_Features.csv file from the discussion forum, to create the map we must follow the following steps on Tableau.
 +
 +
    • Import these two sets of data.
 +
    • join them by "StHimark_ID".
 +
    • set "Marks" as "Polygon".
 +
    • "Point Order" into "Path", and "StHimark_ID" and "Sub Polygon_Id" into "Detail".
 +
    • Put "longitude" and "latitude" as rows and columns.
 +
 +
The resulting map would look like this.
 +
 +
|-
 +
| Image || [[File:Image4.png|800px|center]]
 +
|}
 +
 +
<br/>
 +
 +
== Dataset Import Structure & Process ==
 +
<p> After cleaning the data, the final .csv file looks like this.
 +
 +
[[File:Image5.png|600px|center]]
 +
 +
<p>This file is then joined with the two map files using inner join. The resulting data join flow looks like this.
 +
[[File:Image6.png|400px|center]]
 +
<p> Thus, the data is processed and ready for Visual Analysis.
 +
 +
== Interactive Visualisation==
 +
 +
<p> Tableau link: https://public.tableau.com/profile/abhyuday.samadder#!/vizhome/Abhyudays_2017_MC1_VA/Home
 +
 +
<p> The opening page is a dashboard displaying the problem statement and two button which leads to additional dashboards further answering the questions.
 +
[[File:Image7.png|800px|center]]
 +
<p> I have used filters to aid my visual analysis and to derive insights for specific instances.
 +
 +
<p> Filters are created for the date to show the affect of the earthquake over time. Filters have also been created for the various sectors, and the Neighbourhood so that I could derive insights from the specific instances.
 +
[[File:Image8.png|200px|center]]
 +
 +
<p> I have also used a heatmap to show as it is a very useful visual aid in showing change over time.
 +
[[File:Image9.png|600px|center]]
 +
 +
<p> I have used damage maps as it is a very useful tool in showing how the earthquake has affected a certain region. It is also a helpful visual aid in showing the comparison between the perceived damage and the real damage.
 +
[[File:Image10.png|800px|center]]
 +
 +
<p> I’ve also used the dual axis feature on multiple occasions to present more data on a specific graph. The line chart is size is dependent on the number of responses which makes it easier to show the certainty in the data set.
 +
I’ve also used a colour as an important visual aid in demonstrating the changes of the damage over the regions, different hues showing different intensities of the damage.
 +
 +
 +
== Q1 Solution==
 +
[[File:Image11.png|400px|center]][[File:Image11a.png|400px|center]]
 +
 +
<p>Based on the 2 shake maps above it can be seen that the northeast corner of the Area is severely affected by the earthquake which includes the regions of Pepper Mill. Safe Town and Old Town.
 +
[[File:Image12.png|400px|center]]
 +
[[File:Image13.png|400px|center]]
 +
[[File:Image14.png|200px|center]]
 +
[[File:Image15.png|100px|center]]
 +
 +
<p> Based on the crowdsource earthquake app on the first day the northeast areas where affected with maximum number of damage reports coming from old Town. The stress was reported from the 14th  to the 16th hour. Old town also received a high amount of damage across all sectors.
 +
[[File:Image16.png|400px|center]]
 +
[[File:Image17.png|400px|center]]
 +
[[File:Image18.png|400px|center]]
 +
[[File:Image19.png|400px|center]]
 +
[[File:Image20.png|400px|center]]
 +
[[File:Image21.png|400px|center]]
 +
[[File:Image22.png|400px|center]]
 +
[[File:Image23.png|400px|center]]
 +
[[File:Image24.png|400px|center]]
 +
 +
<p> It can be seen quite clearly through the above images that old town has been affected by the earthquake consistently across all the days and therefore it should receive a prioritized response.
 +
Furthermore, roads and bridges are very important for rescue and recovery, thus they should also be prioritized.
 +
Wilson Forest receives the least number of stress calls through the application, while the other regions in the northeast sector receives more stress calls, so they should be prioritized over it.
 +
 +
== Q2 Solution==
 +
[[File:Image25.png|800px|center]]
 +
 +
<p> The larger the database the more reliable the data is. This point is repeatedly proven in the above dashboard. As you toggle across the different days. It can be seen that when there are large number of respondents, the Shake Intensity and the Sector damage show similar results. It can be further seen in the “Comparison of Shake and Sector Damage over Time” as the size of the lines are proportionate to the number of responses. It can be seen that when the two of them are similar the lines are thicker, and they are closer to each other.
 +
<p> In conclusion, the uncertainty of the data is highly dependent on the number of people reporting the Damage in the application.
 +
On toggling through the different categories, the most unreliable data is coming from the southeast region of the region which are “Wilson Forest”, ”Pepper Mill”, “Cheddarford”, “Safe Town” and “East Parton”
 +
<p> On the other hand, the “Northeast”, “Downtown”, and the “Weston” region are providing reliable data being in the centre of the town.
 +
<p> These insights are derived by comparing the two damage maps side by side on the dashboard.
 +
 +
== Q3 Solution==
 +
[[File:Image26.png|800px|center]]
 +
 +
<p> Based on the given data, it can be seen that there are three steps of the earthquake that occur, the Pre-quake  that occurs on the 6th.  The main earthquake that occurs on the early part of the 9th and the later part of the 8th. And finally the aftershock which occurs on the 10th.
 +
<p> During  the time of the pre-quake, the number of people responding are gradually increasing. There is an immediate spike in the number of reports after the first major earthquake on the 8th. Then again gradually the number of reports decrease.
 +
On toggling through the different categories, the most unreliable data is coming from the southeast region of the region which are “Wilson Forest”, ”Pepper Mill”, “Cheddarford”, “Safe Town” and “East Parton”
 +
<p> The data uncertainty comes in when there is insufficient data. This could be due to power outages which causes a fall in the number of reports.
 +
<p> Finally, it can be seen that in the central region the data is somewhat consistent before the pre-quake and then it started deviating from the truth which indicates they might have had some problems reporting the data.
 +
 +
== References ==
 +
The following references have been useful in the completion of the analysis process:
 +
* Past project reference #1: https://wiki.smu.edu.sg/1617t1IS428g1/IS428_2016-17_Term1_Assign3_Gwendoline_Tan_Wan_Xin
 +
* Past project reference #2: https://wiki.smu.edu.sg/1617t1IS428g1/IS428_2016-17_Term1_Assign3_Tan_Kee_Hock
 +
* Dataset source: https://vast-challenge.github.io/2019/MC1.html
 +
* The VA discussion forum for reference on the how to create the map along with this link: https://community.tableau.com/thread/116369
 +
 +
== Comments ==

Revision as of 13:49, 14 October 2019

Problem and Motivation

St. Himark has been hit by an earthquake, leaving officials scrambling to determine the extent of the damage and dispatch limited resources to the areas in most need. They quickly receive seismic readings and use those for an initial deployment but realize they need more information to make sure they have a realistic understanding of the true conditions throughout the city.

In a prescient move of community engagement, the city had released a new damage reporting mobile application shortly before the earthquake. This app allows citizens to provide more timely information to the city to help them understand damage and prioritize their response. In this mini-challenge, use app responses in conjunction with shake maps of the earthquake strength to identify areas of concern and advise emergency planners. Note: the shake maps are from April 6 and April 8 respectively.

With emergency services stretched thin, officials are relying on citizens to provide them with much needed information about the effects of the quake to help focus recovery efforts.

By combining seismic readings of the quake, responses from the app, and background knowledge of the city, help the city triage their efforts for thumb rescue and recovery.

Tasks and Questions:

  1. Emergency responders will base their initial response on the earthquake shake map. Use visual analytics to determine how their response should change based on damage reports from citizens on the ground. How would you prioritize neighborhoods for response? Which parts of the city are hardest hit? Limit your response to 1000 words and 10 images.
  2. Use visual analytics to show uncertainty in the data. Compare the reliability of neighborhood reports. Which neighborhoods are providing reliable reports? Provide a rationale for your response. Limit your response to 1000 words and 10 images.
  3. How do conditions change over time? How does uncertainty in change over time? Describe the key changes you see. Limit your response to 500 words and 8 images.

Transforming and analysing the dataset

The first thing that needs to be done is to analyse the given dataset which is a single CSV file with the following fields:

  • time: timestamp of incoming report/record, in the format YYYY-MM-DD hh:mm:ss
  • location: id of neighborhood where person reporting is feeling the shaking and/or seeing the damage
  • {shake_intensity, sewer_and_water, power, roads_and_bridges, medical, buildings}: reported categorical value of how violent the shaking was/how bad the damage was (0 - lowest, 10 - highest; missing data allowed)

Since the provided data is not in a suitable form for visual Analysis on Tableau, I have used Tableau Prep Builder to pre-process the data.

Data Cleaning

Problem 1 Too many columns in the dataset.
Solution: The different sectors, i.e., sewer_and_water, power, roads_and_bridges, medical, buildings are each column headers. For easier analysis we can pivot these columns and name the newly formed columns, Sector and Sector Damage. This can be seen in the screenshot below.
Image
Image1.png
Problem 2 The shake intensity is represented as numbers.
Solution Binning the shale intensity as represented in the following picture using a calculated field.
Image
Image2.png
Image3.png
Problem 3 Unable to fit the image polygons on the map from MC2
Solution I followed the steps provided my one of our classmates on the VA discussion forum to fit the map and the shape file together. After downloading the STHimark_Points.csv and STHimark_Features.csv file from the discussion forum, to create the map we must follow the following steps on Tableau.
   • Import these two sets of data.
   • join them by "StHimark_ID".
   • set "Marks" as "Polygon".
   • "Point Order" into "Path", and "StHimark_ID" and "Sub Polygon_Id" into "Detail".
   • Put "longitude" and "latitude" as rows and columns.

The resulting map would look like this.

Image
Image4.png


Dataset Import Structure & Process

After cleaning the data, the final .csv file looks like this.

Image5.png

This file is then joined with the two map files using inner join. The resulting data join flow looks like this.

Image6.png

Thus, the data is processed and ready for Visual Analysis.

Interactive Visualisation

Tableau link: https://public.tableau.com/profile/abhyuday.samadder#!/vizhome/Abhyudays_2017_MC1_VA/Home

The opening page is a dashboard displaying the problem statement and two button which leads to additional dashboards further answering the questions.

Image7.png

I have used filters to aid my visual analysis and to derive insights for specific instances.

Filters are created for the date to show the affect of the earthquake over time. Filters have also been created for the various sectors, and the Neighbourhood so that I could derive insights from the specific instances.

Image8.png

I have also used a heatmap to show as it is a very useful visual aid in showing change over time.

Image9.png

I have used damage maps as it is a very useful tool in showing how the earthquake has affected a certain region. It is also a helpful visual aid in showing the comparison between the perceived damage and the real damage.

Image10.png

I’ve also used the dual axis feature on multiple occasions to present more data on a specific graph. The line chart is size is dependent on the number of responses which makes it easier to show the certainty in the data set. I’ve also used a colour as an important visual aid in demonstrating the changes of the damage over the regions, different hues showing different intensities of the damage.

Q1 Solution

Image11.png
Image11a.png

Based on the 2 shake maps above it can be seen that the northeast corner of the Area is severely affected by the earthquake which includes the regions of Pepper Mill. Safe Town and Old Town.

Image12.png
Image13.png
Image14.png
Image15.png

Based on the crowdsource earthquake app on the first day the northeast areas where affected with maximum number of damage reports coming from old Town. The stress was reported from the 14th to the 16th hour. Old town also received a high amount of damage across all sectors.

Image16.png
Image17.png
Image18.png
Image19.png
Image20.png
Image21.png
Image22.png
Image23.png
Image24.png

It can be seen quite clearly through the above images that old town has been affected by the earthquake consistently across all the days and therefore it should receive a prioritized response. Furthermore, roads and bridges are very important for rescue and recovery, thus they should also be prioritized. Wilson Forest receives the least number of stress calls through the application, while the other regions in the northeast sector receives more stress calls, so they should be prioritized over it.

Q2 Solution

Image25.png

The larger the database the more reliable the data is. This point is repeatedly proven in the above dashboard. As you toggle across the different days. It can be seen that when there are large number of respondents, the Shake Intensity and the Sector damage show similar results. It can be further seen in the “Comparison of Shake and Sector Damage over Time” as the size of the lines are proportionate to the number of responses. It can be seen that when the two of them are similar the lines are thicker, and they are closer to each other.

In conclusion, the uncertainty of the data is highly dependent on the number of people reporting the Damage in the application. On toggling through the different categories, the most unreliable data is coming from the southeast region of the region which are “Wilson Forest”, ”Pepper Mill”, “Cheddarford”, “Safe Town” and “East Parton”

On the other hand, the “Northeast”, “Downtown”, and the “Weston” region are providing reliable data being in the centre of the town.

These insights are derived by comparing the two damage maps side by side on the dashboard.

Q3 Solution

Image26.png

Based on the given data, it can be seen that there are three steps of the earthquake that occur, the Pre-quake that occurs on the 6th. The main earthquake that occurs on the early part of the 9th and the later part of the 8th. And finally the aftershock which occurs on the 10th.

During the time of the pre-quake, the number of people responding are gradually increasing. There is an immediate spike in the number of reports after the first major earthquake on the 8th. Then again gradually the number of reports decrease. On toggling through the different categories, the most unreliable data is coming from the southeast region of the region which are “Wilson Forest”, ”Pepper Mill”, “Cheddarford”, “Safe Town” and “East Parton”

The data uncertainty comes in when there is insufficient data. This could be due to power outages which causes a fall in the number of reports.

Finally, it can be seen that in the central region the data is somewhat consistent before the pre-quake and then it started deviating from the truth which indicates they might have had some problems reporting the data.

References

The following references have been useful in the completion of the analysis process:

Comments