IS428 AY2019-20T1 Parth Goda Rajesh

From Visual Analytics for Business Intelligence
Revision as of 21:53, 13 October 2019 by Parthrg.2017 (talk | contribs)
Jump to navigation Jump to search

Background

Welcome to St. Himark! A fictional city that will is being used in this visual case study. It is a city of 19 neighborhoods, all of which have their unique characteristics and amenities. St. Himark has a population of 246,839 people, and it's located in the Oceanus Sea. It is also home to the world-renowned St. Himark Museum, beautiful beaches, and the Wilson Forest Nature Preserve. It is one of the best cities to raise a family and work. Always Safe Nuclear Power Plant provides the majority of the power in the city and jobs in the Safe Town. Mayor Jordan and the city council current govern the city.

The runs in the following utilities:

  1. Water and Sewage
  2. Road and Bridge
  3. Gas
  4. Garbage
  5. Power

There is always construction going on in the above utilities.

St. Himark is segregated into 19 neighborhoods:

  1. PALACE HILLS
  2. NORTHWEST
  3. OLD TOWN
  4. SAFE TOWN
  5. SOUTHWEST
  6. DOWNTOWN
  7. WILSON FOREST
  8. SCENIC VISTA
  9. BROADVIEW
  10. CHAPPARAL
  11. TERRAPIN SPRINGS
  12. PEPPER MILL
  13. CHEDDARFORD
  14. EASTON
  15. WESTON
  16. SOUTHTOWN
  17. OAK WILLOW
  18. EAST PARTON
  19. WEST PARTON

Problem - VAST challenge MC1

There was an earthquake northwest of St. Himark. It occurred between 6 April 2020 and 8 April 2020. The city's officials needed to collect data immediately to understand the extent of the damage. Then can then allocate resources efficiently to the areas of town where it's needed and dispatch their emergency services.

At first, they only have the seismic readings of the earthquake and used that for their first round of dispatch. Now, however, they need more information to get a better gauge of what is going on on the ground level.

Purpose

To gather the information the city official's need. They launched an app where the citizens can report the intensity of shake and level of damage done to utility infrastructure. The officials can use this tool to record data provided by citizens. The citizens use the app to note down the level of damage seen on a utility/infrastructure building in a Neighbourhood. They can also record the shake intensity in the neighborhood. The data is stored every 5 mins. They may also be some data loss or delay due to power shortages.

With all this data, visualizations were created to understand the data faster. Recommendations and decisions can be churned out more quickly to get help to people faster.

The following questions also have to be answered:

  1. Emergency responders will base their initial response on the earthquake shake map. Use visual analytics to determine how their response should change based on damage reports from citizens on the ground. How would you prioritize neighborhoods for the reaction? Which parts of the city are the hardest hit?
  2. Use visual analytics to show uncertainty in the data. Compare the reliability of neighborhood reports. Which neighborhoods are providing reliable reports? Provide a rationale for your response.
  3. How do conditions change over time? How does uncertainty in change over time? Describe the key changes you see.


Data Gathering and Clean up

The data provided in an mc1-reports-data.csv file with the following data:

The first few rows of the data provided in the CSV file

The headers were:

  1. Time: A timestamp of the report made by a citizen. The format is in DD/MM/YY HH:MM:SS
  2. sewer_and_water: Damage recorded on the sewer and water systems in the neighborhood and at the timestamp. 0 is the lowest level of damage, while 10 is the highest
  3. power: Damage recorded on the power generation systems in the neighborhood and at the timestamp. 0 is the lowest level of damage, while 10 is the highest
  4. roads_and_bridges: Damage recorded on the roads and bridges in the neighborhood and at the timestamp. 0 is the lowest level of damage, while 10 is the highest
  5. medical: Damage recorded on the medical facilities in the neighborhood and at the timestamp. 0 is the lowest level of damage, while 10 is the highest
  6. buildings: Damage recorded on the buildings in the neighborhood and at the timestamp. 0 is the lowest level of damage, while 10 is the highest
  7. shake_intensity: How violent the shaking was in the neighborhood and at the timestamp.
  8. location: Id of the neighborhood for which the citizen is reporting his readings. (This will be matched to the neighborhood data in the map file)

Cleaning up the data

Using Tableau Prep Builder, data from the CSV file was moved around and changed a little to make visualizations better.

Pivoting

To start, I first pivot the medical, power, road_and_bridges, sewer_and_water, buildings, and shake_intensity on the dashboard. The utilities are called "Source of reading," and values are called "Readings."

Step 1: Pivoting the dashboard

Cleaning up names

Next, I renamed the following sources of reading and capitalized the rest:

  1. road_and_bridges into "Road and Bridges"
  2. sewer_and_water into "Sewer and Water"
  3. shake_intensity into "Shake Intensity"
Step 2: Name clean up

Setting up Tableau

To start my visualization journey, I first added a file called StHimark.shp taken form the MC2 VAST Data challenge 2's data files to create the interactive map on a tableau workbook. This file has the following fields:

  1. ID: Id of the neighborhood
  2. location: Name of the neighborhood
  3. Longitude: Longitude coordinate of the neighborhood
  4. Latitude: Latitude coordinate of the neighborhood

I then dragged and dropped the output file form tableau prep into tableau. I used ID from StHimark.shp and location from mc1-reports-data.csv and inner joined them:

Step 3: Inner Join the two worksheets

This was the result.

Final Result of Data Transformation

Visualization and Interactive techniques

The visualizations I created was all connected from a simple main page that let the user choose if they want to see either:

  1. Based on each neighborhood in St. Himark
  2. Based on each reading source. E.g., Building or Medical damage
  3. Based on the Map of St. Himark progression through the six days

This design has implemented the idea that when city officials turn to this dashboard to look for data on how to allocate resources, they can start to form their decisions based on either neighborhood, a utility that they want to work on, or see the timeline of the whole incident.

Main Page
Purpose / Description
This is the landing page of the applicant. On this page the user gets a quick summary of what is the purpose of this application and an option to dive into three areas of reporting. I have created this page as a starting point where the user can keep coming back to and navigating away from. i
Main Page of Application


Interactive Technique

I also use interesting interactive techniques for the dashboard to be more user friendly.

  1. Select : Button Redirection
  2. Available in the tableau dashboard catalog of objects: Button. When clicked on in the tableau public website, it redirects the user to specific pages that are mapped by me
    Button used to redirect
  3. Select : Hover
  4. For Extra information about the button and what it does or where it leads to
    Button Tooltip
Neighborhood Dashboard
Purpose / Description
This dashboard is designed to understand what is going on in each neighbourhood. When the user first lands on the page, all the charts show data for all the neighborhoods and sources of readings. But when the user filters the data by selecting an area on the map or one of the buttons on the list, the data changes to show the real picture in each neighborhood. The purpose of this is for the use case where city officials need to know what is going on in each neighborhood. They can see what the most reported utility and what is the most reading that is being reported. They can also see all the filtered data on a timeline, to understand when and how much was reported.
The top half of the neighborhoods dashboard
The bottom half of the neighborhoods dashboard
Interactive Technique
  1. Tooltips
  2. A user can put his cursor over any data point or visual elements in the charts to find out specific details about that data point. A small box will appear with relevant information.
    Tooltips that appear when cursor is placed on an element


  3. Filter
  4. There are two filters on this dashboard:
    1. Neighborhood
    2. Source of reading
    Using these two filters, the users can find specific information and make decisions about things they are interested in. On this dashboard, the main filter is the map filter on the right called "Map reference." By choosing one area, they can see the timeline, bar graphs, and heat maps about everything reported in that area. He can also apply a second filter from the "Source of reading" to get more information.
    The two filters available on the neighborhood dashboard
  5. Connect
  6. Since both the heat map and the total readings collected bar graphs are time-based visualizations. When a user hovers over one of the two charts, the same data point will be highlighted in the other chart too. It's more user-friendly because it helps to keep track of the data points.
    Two connected graphs highlight together when hovered over
Types of Charts used

There are three types of charts used:

  1. Bar graphs
  2. Heat Maps
  3. Bar graphs over a timeline
All the charts used in the neighbourhood dashboard
Utility Dashboard
Purpose / Description
This dashboard is designed to understand how each utility affects each neighbourhood. When the user first lands on the page, all the charts show data for all sources of readings and ranks how badly damaged the neighborhoods are. When the user filters the data by selecting one of the buttons on the Source of the readings list, clearer data is shown where the ranking is now based on the selected source of reading. The purpose of this is for the use case where city officials need to allocate specific resources like emergency power or building repairs. They can see what the most reported neighborhoods and what is the total/average readings that are being reported. They can also see all the filtered data on a timeline, to understand when and how much was reported.
The Utility Dashboard
Interactive Technique
  1. Tooltips
  2. A user can put his cursor over any data point or visual elements in the charts to find out specific details about that data point. A small box will appear with relevant information.
    Tool tip observed in the utility dashboard


  3. Filter
  4. There are two filters on this dashboard:
    1. Neighborhood
    2. Source of reading
    Using these two filters the users can find specific information and make decisions about things they are interested in. On this dashboard, the main filter is the source of readings filter on the right called "Select Source of Reading". By choosing one reading type, they can see the timeline, bar graphs and choropleth map report information on that source of reading. They can also apply a second filter from the "Select Neighbourhood(s)" to get more area-specific
    Map and utility filter
  5. Connect
  6. These four graphs are all connected by hovering the cursor over them. The neighborhood that is under your cursor will be highlighted in all four charts. The users can now track the neighborhood place on all four charts. They can know its rank in both total and avg in the comparison charts and the maps
    4 Connected charts
Types of Charts used

There are three types of charts used:

  1. Bar graphs
A sample bar graph in the Dashboard
  1. Bar graphs over a timeline
A timeline based bar graph
  1. Choropleth map
A choropleth map chart
Time Dashboard
Purpose / Description

This dashboard was designed to understand what kind of data was reported during the earthquake. Users can see how each neighbourhood was affected in terms of the damage done to infrastructure and shake intensity over a period of time. They can specify a period of time in the maps or look at the overall picture on the heat maps.

Top Half of the time Dashboard
Bottom Half of the time Dashboard
Interactive Technique
  1. Tooltips
  2. A user can put his cursor over any data point or visual elements in the charts to find out specific details about that data point. A small box will appear with relevant information.
    Tooltip in the time Dashboard


  3. Filter
  4. There are two filters on this dashboard:
    1. Source of reading
    2. Time pages
    A user can use the source of readings filter to get charts on each area of utility or the shake intensity. This filters the data in all four charts. However the time pages filter the colorpleth maps for the specific moment selected.
    Time Dashboard Filters
  5. Connect
  6. The two heat maps are connected as the share the same timelines and data structure. By hovering over any heatmap point with the cursor, the same time period of the point will be highlighted in the other heat map.
    Hovering over one Heatmap
Types of Charts used

There are two types of charts used:

  1. Choropleth map
Time based Colorpleth Map
  1. Heat Map
Heat Maps

Use Cases

Let's say a city official wants to know about what going on in a particular neighborhood. He will start on the home page

Task Results

Question 1

Question: Emergency responders will base their initial response on the earthquake shake map. Use visual analytics to determine how their response should change based on damage reports from citizens on the ground. How would you prioritize neighborhoods for the reaction? Which parts of the city are the hardest hit?

Point Recommendation
1 To start off I would go to the Utility Dashboard to understand the different ways the city has been hit. Change the "source of readings" filter to all, building, power, etc to analyze which neighborhoods need help and are worst hit.
2 Let's start with overall reports:

On both average and total readings, the following towns seem to be hardest hit in descending order

  1. Old town
  2. Boardview
  3. Scenic Vista
  4. Easton and Terrapin Springs
Overall Worst Hit

However, for the first response, the city officials should use the shake intensity as this represents the towns that are closest to the earthquake epicenter

  1. Old town
  2. Pepper Mill
  3. Willson Forest
  4. Safe Town
  5. Easton
Shake Intensity Map

They should also pay special attention to Safe town as the nuclear reactor is there. There will definitely be some building damage and power damage to the reactor. Worst case it could represent the like of Fukushima Daiichi nuclear disaster.

3 However to make better decisions on which neighborhoods deserve special attention, filtering the data based on the sources would be better.

From the charts, the following would need the most help to fix the buildings

  1. Old town
  2. Boardview
  3. Chapparal
  4. Scenic Vista
  5. East Patron
Most Building Damage
4 Next would be which areas need special medical emergency response:
  1. Old Town
  2. Broadview
  3. Palace Hills
  4. Southton
  5. Downtown

Which is a priority as these areas are the ones with seven hospitals shared between them.

Hospital Damage Chart
5 For Power, Roads, and bridges, and sewage response, the data shows that all towns need about the same amount of help. The exceptions are old town and scenic view. They seem to stand out with most amounts of reports recorded by citizens.

For example here is the power total reading chart:

Power Chart

Looking at the socio-economic class of the residents at Scenic View, it would seem that since they have better infrastructure and equipment, an upper class mindset, their numbers might be over recorded as they are not very close to the epicenter of the earthquake. It is a little suspicious that their number match of neighborhoods closer to the epicenter.

Question 2

Question: Use visual analytics to show uncertainty in the data. Compare the reliability of neighborhood reports. Which neighborhoods are providing reliable reports? Provide a rationale for your response.

Point Recommendation
1 One way the reports are not reliable is due to the many gaps in data. Especially on the 8th of April 2020, which is around the time the earthquake happened.

For example, in this heat map, we can clearly see the missing information in some neighborhoods close to the epicenter:

Heat Map with missing data

These neighborhoods are Old town, Boardview, Chapparal, Scenic Vista. All the readings after the gaps are much higher. It would seem maybe a backlog of data would rush in when the network connection had fixed. To collect the data when the earthquake hit was the main goal of the app, however, the missing data during the earthquake makes the whole process less reliable.

2 There is a possibility of inflated numbers due to mindsets and socio-economic class. I would like to accentuate the following:
  1. SCENIC VISTA
  2. NORTHWEST
  3. PALACE HILLS
  4. Boardview

These areas are expensive places with trendy and rich patrons. Which also comes with better infrastructure, utilities, maintenance, and security. The people might be more educated and tech-savvy and have more awareness of the app compared to the rest of the town. It's also a possibility of an upper class, elite or entitled mentality that affect the numbers reported and their frequency by the citizens. In the case of boardview: most of thier citizen are of the older generation, and some level of fear or painc might affected the numbers more then usual.

Neighbourhoods of uncertainty
3 Similar to the observations in point 1, the following areas had missing data a day or so after the earthquake happened:
  1. Eaton
  2. Oak Willow
  3. Old town
  4. Pepper Mill
  5. Safe Town
  6. Scenic View

This definitely affected the reliability and certainty of the data and the charts. Especially to understand the aftermath of the earthquake and see which areas still need attention after the first response has been sent out.

More missing data after the earthquake
4 One way the neighborhoods are providing certain data is the shake intensity. Areas close to the epicenter do show higher average readings compared to areas further away. This also mirrors the information in the earthquake shakemap provided.

Thus, to some extent, data from the following towns can be said to be more certain:

  1. Old Town
  2. Safe Town
  3. Pepper Mill
  4. Wilson Forest
Provided shakemap by VAST challenge organisers
Observed average shake intensity


5 One issue with the data collected is in the palace hill and northwest neighborhoods. As seen in this picture, these places have the highest concentrations of roads in St. Himark.
Provided Roadmap by VAST challenge organisers

However, the data shows that not much damage is seen or reported on the roads in this area. That would be contradictory to common logic as these areas by default should record the most damage to roads and belief. On the other hand, the old town and scenic view record and report the most damage. This is not realistic as these towns do not have that many roads compared to palace hill and northwest.

Roads and bridges Data