IS428 AY2019-20T1 Parth Goda Rajesh

From Visual Analytics for Business Intelligence
Revision as of 23:26, 13 October 2019 by Parthrg.2017 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Background

Welcome to St. Himark! A fictional city that will is being used in this visual case study. It is a city of 19 neighborhoods, all of which have their unique characteristics and amenities. St. Himark has a population of 246,839 people, and it's located in the Oceanus Sea. It is also home to the world-renowned St. Himark Museum, beautiful beaches, and the Wilson Forest Nature Preserve. It is one of the best cities to raise a family and work. Always Safe Nuclear Power Plant provides the majority of the power in the city and jobs in the Safe Town. Mayor Jordan and the city council current govern the city.

The runs in the following utilities:

  1. Water and Sewage
  2. Road and Bridge
  3. Gas
  4. Garbage
  5. Power

There is always construction going on in the above utilities.

St. Himark is segregated into 19 neighborhoods:

  1. PALACE HILLS
  2. NORTHWEST
  3. OLD TOWN
  4. SAFE TOWN
  5. SOUTHWEST
  6. DOWNTOWN
  7. WILSON FOREST
  8. SCENIC VISTA
  9. BROADVIEW
  10. CHAPPARAL
  11. TERRAPIN SPRINGS
  12. PEPPER MILL
  13. CHEDDARFORD
  14. EASTON
  15. WESTON
  16. SOUTHTOWN
  17. OAK WILLOW
  18. EAST PARTON
  19. WEST PARTON

Problem - VAST challenge MC1

There was an earthquake northwest of St. Himark. It occurred between 6 April 2020 and 8 April 2020. The city's officials needed to collect data immediately to understand the extent of the damage. Then can then allocate resources efficiently to the areas of town where it's needed and dispatch their emergency services.

At first, they only have the seismic readings of the earthquake and used that for their first round of dispatch. Now, however, they need more information to get a better gauge of what is going on at the ground level.

Purpose

To gather the information, the city official's need. They launched an app where the citizens can report the intensity of shake and level of damage done to utility infrastructure. The officials can use this tool to record data provided by citizens. The citizens use the app to note down the level of damage seen on a utility/infrastructure building in a Neighbourhood. They can also record the shake intensity in the neighborhood. The data is stored every 5 mins. They may also be some data loss or delay due to power shortages.

With all this data, visualizations were created to understand the data faster. Recommendations and decisions can be churned out more quickly to get help to people faster.

The following questions also have to be answered:

  1. Emergency responders will base their initial response on the earthquake shake map. Use visual analytics to determine how their response should change based on damage reports from citizens on the ground. How would you prioritize neighborhoods for the reaction? Which parts of the city are the hardest hit?
  2. Use visual analytics to show uncertainty in the data. Compare the reliability of neighborhood reports. Which neighborhoods are providing reliable reports? Provide a rationale for your response.
  3. How do conditions change over time? How does uncertainty in change over time? Describe the key changes you see.


Data Gathering and Clean up

The data provided in an mc1-reports-data.csv file with the following data:

The first few rows of the data provided in the CSV file

The headers were:

  1. Time: A timestamp of the report made by a citizen. The format is in DD/MM/YY HH:MM:SS
  2. sewer_and_water: Damage recorded on the sewer and water systems in the neighborhood and at the timestamp. 0 is the lowest level of damage, while 10 is the highest
  3. power: Damage recorded on the power generation systems in the neighborhood and at the timestamp. 0 is the lowest level of damage, while 10 is the highest
  4. roads_and_bridges: Damage recorded on the roads and bridges in the neighborhood and at the timestamp. 0 is the lowest level of damage, while 10 is the highest
  5. medical: Damage recorded on the medical facilities in the neighborhood and at the timestamp. 0 is the lowest level of damage, while 10 is the highest
  6. buildings: Damage recorded on the buildings in the neighborhood and at the timestamp. 0 is the lowest level of damage, while 10 is the highest
  7. shake_intensity: How violent the shaking was in the neighborhood and at the timestamp.
  8. location: Id of the neighborhood for which the citizen is reporting his readings. (This will be matched to the neighborhood data in the map file)

Cleaning up the data

Using Tableau Prep Builder, data from the CSV file was moved around and changed a little to make visualizations better.

Pivoting

To start, I first pivot the medical, power, road_and_bridges, sewer_and_water, buildings, and shake_intensity on the dashboard. The utilities are called "Source of reading," and values are called "Readings."

Step 1: Pivoting the dashboard

Cleaning up names

Next, I renamed the following sources of reading and capitalized the rest:

  1. road_and_bridges into "Road and Bridges."
  2. sewer_and_water into "Sewer and Water"
  3. shake_intensity into "Shake Intensity"
Step 2: Name clean up

Setting up Tableau

To start my visualization journey, I first added a file called StHimark.shp taken form the MC2 VAST Data challenge 2's data files to create the interactive map on a tableau workbook. This file has the following fields:

  1. ID: Id of the neighborhood
  2. location: Name of the neighborhood
  3. Longitude: Longitude coordinate of the neighborhood
  4. Latitude: Latitude coordinate of the neighborhood

I then dragged and dropped the output file form tableau prep into tableau. I used ID from StHimark.shp and location from mc1-reports-data.csv and inner joined them:

Step 3: Inner Join the two worksheets

This was the result.

Final Result of Data Transformation

Visualization and Interactive techniques

See the charts here: https://public.tableau.com/profile/parth.goda#!/vizhome/VASTchallengemc1Parthgoda/MainPage?publish=yes

The visualizations I created were all connected from a simple main page that let the user choose if they want to see either:

  1. Based on each neighborhood in St. Himark
  2. Based on each reading source. E.g., Building or Medical damage
  3. Based on the Map of St. Himark progression through the six days

This design has implemented the idea that when city officials turn to this dashboard to look for data on how to allocate resources, they can start to form their decisions based on either neighborhood, a utility that they want to work on, or see the timeline of the whole incident.

Main Page
Purpose / Description
This is the landing page of the applicant. On this page the user gets a quick summary of what is the purpose of this application and an option to dive into three areas of reporting. I have created this page as a starting point where the user can keep coming back to and navigating away from.
Main Page of Application


Interactive Technique

I also use interesting interactive techniques for the dashboard to be more user friendly.

  1. Select : Button Redirection
  2. Available in the tableau dashboard catalogue of objects: Button. When clicked on in the tableau public website, it redirects the user to specific pages that are mapped by me
    Button used to redirect
  3. Select : Hover
  4. For Extra information about the button and what it does or where it leads to
    Button Tooltip
Neighborhood Dashboard
Purpose / Description
This dashboard is designed to understand what is going on in each neighbourhood. When the user first lands on the page, all the charts show data for all the neighborhoods and sources of readings. But when the user filters the data by selecting an area on the map or one of the buttons on the list, the data changes to show the real picture in each neighborhood. The purpose of this is for the use case where city officials need to know what is going on in each neighborhood. They can see what the most reported utility and what is the most reading that is being reported. They can also see all the filtered data on a timeline, to understand when and how much was reported.
The top half of the neighborhoods dashboard
The bottom half of the neighborhoods dashboard
Interactive Technique
  1. Tooltips
  2. A user can put his cursor over any data point or visual elements in the charts to find out specific details about that data point. A small box will appear with relevant information.
    Tooltips that appear when cursor is placed on an element


  3. Filter
  4. There are two filters on this dashboard:
    1. Neighborhood
    2. Source of reading
    Using these two filters, the users can find specific information and make decisions about things they are interested in. On this dashboard, the main filter is the map filter on the right called "Map reference." By choosing one area, they can see the timeline, bar graphs, and heat maps about everything reported in that area. He can also apply a second filter from the "Source of reading" to get more information.
    The two filters available on the neighborhood dashboard
  5. Connect
  6. Since both the heat map and the total readings collected bar graphs are time-based visualizations. When a user hovers over one of the two charts, the same data point will be highlighted in the other chart too. It's more user-friendly because it helps to keep track of the data points.
    Two connected graphs highlight together when hovered over
Types of Charts used

There are three types of charts used:

  1. Bar graphs
  2. Heat Maps
  3. Bar graphs over a timeline
All the charts used in the neighbourhood dashboard
Utility Dashboard
Purpose / Description
This dashboard is designed to understand how each utility affects each neighbourhood. When the user first lands on the page, all the charts show data for all sources of readings and ranks how badly damaged the neighborhoods are. When the user filters the data by selecting one of the buttons on the Source of the readings list, clearer data is shown where the ranking is now based on the selected source of reading. The purpose of this is for the use case where city officials need to allocate specific resources like emergency power or building repairs. They can see what the most reported neighborhoods and what is the total/average readings that are being reported. They can also see all the filtered data on a timeline, to understand when and how much was reported.
The Utility Dashboard
Interactive Technique
  1. Tooltips
  2. A user can put his cursor over any data point or visual elements in the charts to find out specific details about that data point. A small box will appear with relevant information.
    Tool tip observed in the utility dashboard


  3. Filter
  4. There are two filters on this dashboard:
    1. Neighborhood
    2. Source of reading
    Using these two filters, the users can find specific information and make decisions about things they are interested in. On this dashboard, the main filter is the source of readings filter on the right called "Select Source of Reading." By choosing one reading type, they can see the timeline, bar graphs, and choropleth map report information on that source of reading. They can also apply a second filter from the "Select Neighbourhood(s)" to get more area-specific
    Map and utility filter
  5. Connect
  6. These four graphs are all connected by hovering the cursor over them. The neighborhood that is under your cursor will be highlighted in all four charts. The users can now track the neighborhood place on all four charts. They can know its rank in both total and average in the comparison charts and the maps
    4 Connected charts
Types of Charts used

There are three types of charts used:

  1. Bar graphs
A sample bar graph in the Dashboard
  1. Bar graphs over a timeline
A timeline-based bar graph
  1. Choropleth map
A choropleth map chart
Time Dashboard
Purpose / Description

This dashboard was designed to understand what kind of data was reported during the earthquake. Users can see how each neighbourhood was affected in terms of the damage done to infrastructure and shake intensity over a period of time. They can specify a period of time in the maps or look at the overall picture on the heat maps.

Top Half of the time Dashboard
Bottom Half of the time Dashboard
Interactive Technique
  1. Tooltips
  2. A user can put his cursor over any data point or visual elements in the charts to find out specific details about that data point. A small box will appear with relevant information.
    Tooltip in the time Dashboard


  3. Filter
  4. There are two filters on this dashboard:
    1. Source of reading
    2. Time pages
    A user can use the source of readings filter to get charts on each area of utility or the shake intensity. This filters the data in all four charts. However, the time pages filter the choropleth maps for the specific moment selected.
    Time Dashboard Filters
  5. Connect
  6. The two heat maps are connected as the share the same timelines and data structure. By hovering over any heatmap point with the cursor, the same time period of the point will be highlighted in the other heat map.
    Hovering over one Heatmap
Types of Charts used

There are two types of charts used:

  1. Choropleth map
Time based Choropleth Map
  1. Heat Map
Heat Maps

Task Results

Question 1

Question: Emergency responders will base their initial response on the earthquake shake map. Use visual analytics to determine how their response should change based on damage reports from citizens on the ground. How would you prioritize neighborhoods for the reaction? Which parts of the city are the hardest hit?

Point Recommendation
1 To start I would go to the Utility Dashboard to understand the different ways the city has been hit. Change the "source of readings" filter to all, building, power, etc. to analyze which neighborhoods need help and are worst hit.
2 Let's start with overall reports:

On both average and total readings, the following towns seem to be hardest hit in descending order

  1. Old town
  2. Boardview
  3. Scenic Vista
  4. Easton and Terrapin Springs
Overall Worst Hit

However, for the first response, the city officials should use the shake intensity as this represents the towns that are closest to the earthquake epicenter

  1. Old town
  2. Pepper Mill
  3. Wilson Forest
  4. Safe Town
  5. Easton
Shake Intensity Map

They should also pay special attention to Safe town as the nuclear reactor is there. There will be some building damage and power damage to the reactor., Worst case, it could represent the like of Fukushima Daiichi nuclear disaster.

3 Howeve,r to make better decisions on which neighborhoods deserve special attention, filtering the data based on the sources would be better.

From the charts, the following would need the most help to fix the buildings

  1. Old town
  2. Boardview
  3. Chapparal
  4. Scenic Vista
  5. East Patron
Most Building Damage
4 Next would be which areas need special medical emergency response:
  1. Old Town
  2. Broadview
  3. Palace Hills
  4. Southton
  5. Downtown

Which is a priority as these areas are the ones with seven hospitals shared between them.

Hospital Damage Chart
5 For Power, Roads, and bridges, and sewage response, the data shows that all towns need about the same amount of help. The exceptions are old town and scenic view. They seem to stand out with most amounts of reports recorded by citizens.

For example, here is the power total reading chart:

Power Chart

Looking at the socio-economic class of the residents at Scenic View, it would seem that since they have better infrastructure and equipment, an upper-class mindset, their numbers might be over recorded as they are not very close to the epicentre of the earthquake. It is a little suspicious that their number match of neighborhoods closer to the epicentre.

Question 2

Question: Use visual analytics to show uncertainty in the data. Compare the reliability of neighborhood reports. Which neighborhoods are providing reliable reports? Provide a rationale for your response.

Point Recommendation
1 One way the reports are not reliable is due to the many gaps in data. Especially on the 8th of April 2020, which is around the time the earthquake happened.

For example, in this heat map, we can clearly see the missing information in some neighborhoods close to the epicentre:

Heat Map with missing data

These neighborhoods are Old town, Boardview, Chapparal, Scenic Vista. All the readings after the gaps are much higher. It would seem maybe a backlog of data would rush in when the network connection had fixed. To collect the data when the earthquake hit was the main goal of the app, however, the missing data during the earthquake makes the whole process less reliable.

2 There is a possibility of inflated numbers due to mindsets and socio-economic class. I would like to accentuate the following:
  1. SCENIC VISTA
  2. NORTHWEST
  3. PALACE HILLS
  4. Boardview

These areas are expensive places with trendy and rich patrons. Which also comes with better infrastructure, utilities, maintenance, and security. The people might be more educated and tech-savvy and have more awareness of the app compared to the rest of the town. It's also a possibility of an upper class, elite or entitled mentality that affect the numbers reported and their frequency by the citizens. In the case of boardview: most of their citizen are of the older generation, and some level of fear or panic might affect the numbers more than usual.

Neighbourhoods of uncertainty
3 Similar to the observations in point 1, the following areas had missing data a day or so after the earthquake happened:
  1. Eaton
  2. Oak Willow
  3. Old town
  4. Pepper Mill
  5. Safe Town
  6. Scenic View

This definitely affected the reliability and certainty of the data and the charts. Specially to understand the aftermath of the earthquake and see which areas still need attention after the first response has been sent out.

More missing data after the earthquake
4 One way the neighborhoods are providing certain data is the shake intensity. Areas close to the epicentre do show higher average readings compared to areas further away. This also mirrors the information in the earthquake shake map provided.

Thus, to some extent, data from the following towns can be said to be more certain:

  1. Old Town
  2. Safe Town
  3. Pepper Mill
  4. Wilson Forest
Provided shake map by VAST challenge organisers
Observed average shake intensity


5 One issue with the data collected is in the palace hill and northwest neighborhoods. As seen in this picture, these places have the highest concentrations of roads in St. Himark.
Provided Roadmap by VAST challenge organisers

However, the data shows that not much damage is seen or reported on the roads in this area. That would be contradictory to common logic as these areas by default, should record the most damage to roads and belief. On the other hand, the old town and scenic view record and report the most damage. This is not realistic as these towns do not have that many roads compared to palace hill and northwest.

Roads and bridges Data
6 One neighborhood that was not providing reliable reports if Wilson forest. Most of the data is either missing due to remoteness of the area, or the neighborhood is not populated enough to provide the city officials with data.

In this heat map of all data collected in Wilson forest, you can see that there is almost no data:

Wilson Forest heat map

Question 3

Question: How do conditions change over time? How does uncertainty in change over time? Describe the key changes you see. Limit your response to 500 words and 8 images.

Point Recommendation
1 In most neighborhoods, areas of Utility and shake intensity, the data follows this trend:
Total data collected on a timeline

Except on 6/4/2020 at 4-5 pm when maybe there might be some pre-earthquake shaking, the readings are all soft or manageable till 8/4/2020 about 8 am. Which is approximately when the earthquake happened. Then, there are some after-shake readings or a pile-up of data from cut out neighborhoods on early 9/4/2020 and afternoon 10/4/2020.

2 Uncertainly does increase over time, mostly after the earthquake. This might be due to the damage to infrastructure that caused some data to be missing in some neighborhoods of St. Himark. Before the earthquake, there were not many missing data (except for Wilson Forest) even though there was construction going on in some places, but afterward, more and more missing data started to occur. Especially to neighborhoods that were closer to the epicentre and had more damaged buildings and roads.

This would make sense as after the earthquake; there were reports of damage to power infrastructure all around town. Without power, many citizens will not be able to power their devices and log the damage and shake data on the app. We can see here in these charts that every neighborhood in St. Himark had experienced some power damage.

Average Power Damage per neighbourhood
2 One other way uncertainty was affected by time was during the 6/4/2020 4-5 pm pre-earthquake shaking. It can be noticed on the heatmap that overall in all neighborhoods the average readings dropped:
Avg readings Heat Map: notice mid-day 6/4/2020

However, at the same time, the number of records coming in overall increased from that same period onwards:

Total readings coming in. Notice 6/4/2020 mid-day

There is something wrong with this as more people would not log onto the app after pre-earthquake shaking to record lower readings of shaking and damage.

Reference

Icon for neighboorhood: https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=2ahUKEwiL1o3QzpPlAhWw6XMBHfqeBi4Qjhx6BAgBEAI&url=https%3A%2F%2Fwww.logosurfer.com%2Flogo%2Fthe-neighbourhood-logo&psig=AOvVaw0BBHU8ychLfogor4jV3f7K&ust=1570862792003906

Icon for gauge https://www.123rf.com/clipart-vector/guage.html?oriSearch=utility&sti=nh1v9o3lxchgqttrvu%7C

Home icon: https://www.iconfinder.com/icons/185038/home_house_streamline_icon

Fukashima disaster: https://en.wikipedia.org/wiki/Fukushima_Daiichi_nuclear_disaster

Model answers:

  1. Gwendoline Tan https://wiki.smu.edu.sg/1617t1IS428g1/IS428_2016-17_Term1_Assign3_Gwendoline_Tan_Wan_Xin
  2. Lim Kim Yong https://wiki.smu.edu.sg/1617t1IS428g1/IS428_2016-17_Term1_Assign3_Lim_Kim_Yong
  3. Chew Yuxi https://public.tableau.com/profile/yuxi7903#!/vizhome/VA_Assignment_Chew_Yuxi/OAQStationsTimeSeries
  4. Tan Kee Hock https://wiki.smu.edu.sg/1617t1IS428g1/IS428_2016-17_Term1_Assign3_Tan_Kee_Hock


Comments