Difference between revisions of "IS428 AY2019-20T1 Assign Wang Xuze"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
 
(30 intermediate revisions by the same user not shown)
Line 2: Line 2:
  
  
=Overview=
+
=Problem Statement=
  
 
St. Himark has been hit by an earthquake, leaving officials scrambling to determine the extent of the damage and dispatch limited resources to the areas in most need. They quickly receive seismic readings and use those for an initial deployment but realize they need more information to make sure they have a realistic understanding of the true conditions throughout the city.
 
St. Himark has been hit by an earthquake, leaving officials scrambling to determine the extent of the damage and dispatch limited resources to the areas in most need. They quickly receive seismic readings and use those for an initial deployment but realize they need more information to make sure they have a realistic understanding of the true conditions throughout the city.
Line 12: Line 12:
 
By combining seismic readings of the quake, responses from the app, and background knowledge of the city, help the city triage their efforts for rescue and recovery.
 
By combining seismic readings of the quake, responses from the app, and background knowledge of the city, help the city triage their efforts for rescue and recovery.
  
=The Task=
+
=Tasks=
  
 
#Emergency responders will base their initial response on the earthquake shake map. Use visual analytics to determine how their response should change based on damage reports from citizens on the ground. How would you prioritize neighborhoods for response? Which parts of the city are hardest hit? Limit your response to 1000 words and 10 images.
 
#Emergency responders will base their initial response on the earthquake shake map. Use visual analytics to determine how their response should change based on damage reports from citizens on the ground. How would you prioritize neighborhoods for response? Which parts of the city are hardest hit? Limit your response to 1000 words and 10 images.
Line 18: Line 18:
 
#How do conditions change over time? How does uncertainty in change over time? Describe the key changes you see. Limit your response to 500 words and 8 images.
 
#How do conditions change over time? How does uncertainty in change over time? Describe the key changes you see. Limit your response to 500 words and 8 images.
  
=Motivation=
+
=Motivations=
There are some reasons/motivations for the development of the visualization.
+
# Provide clear overview of the citizen reports to aid decision making.
# Provide clear overview of the citizen reports to aid decision making
+
# Inform the uncertainty and reliability of the citizen reports.
# Inform the uncertainty and reliability of the citizen reports
+
# Show how conditions change over time.
# Show how conditions change over time
+
# Allow effective emergency response to save life.
# Allow effective emergency response to save life
 
  
=The Data=
+
=Data Description=
  
The data includes one (CSV) file spanning the entire length of the event, containing (categorical) individual reports of shaking/damage by neighborhood over time.
+
The data includes
Reports are made by citizens at any time, however, they are only recorded in 5-minute batches/increments due to the server configuration. Furthermore, delays in the receipt of reports may occur during power outages.
+
<ol><li> A <i>mc1-reports-data.csv</i> file spanning the entire length of the event, containing (categorical) individual reports of shaking/damage by neighborhood over time. It has these fields:<br>
 +
<ul><li> time: timestamp of incoming report/record, in the format YYYY-MM-DD hh:mm:ss
 +
<li>location: id of neighborhood where person reporting is feeling the shaking and/or seeing the damage
 +
<li> {shake_intensity, sewer_and_water, power, roads_and_bridges, medical, buildings}: reported categorical value of how violent the shaking was/how bad the damage was (0 - lowest, 10 - highest; missing data allowed) </ul>
  
Data file <i>mc1-reports-data.csv</i> has these fields:
+
<li>Two shakemap PNG files which indicate where the corresponding earthquakes' epicenters originate as well as how much shaking can be felt across the city.
    - time: timestamp of incoming report/record, in the format YYYY-MM-DD hh:mm:ss
 
    - location: id of neighborhood where person reporting is feeling the shaking and/or seeing the damage
 
    - {shake_intensity, sewer_and_water, power, roads_and_bridges, medical, buildings}: reported categorical value of how violent the shaking was/how bad the damage was (0 - lowest, 10 - highest; missing data allowed)
 
  
Two shakemap (PNG) files which indicate where the corresponding earthquakes' epicenters originate as well as how much shaking can be felt across the city.
+
<li>The <i>StHimark.shp</i> provides the geospatial vector data format St. Himark.
 
+
</ol>
The StHimark.shp shapefile provides the geospatial vector data format St. Himark.
 
 
 
The data will then be visualized using Tableau.
 
  
=Data Cleaning=
+
=Data Preparation=
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Problem #1 || Building Data
+
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 20%;" | Join the reports data and Shapefile
 
|-
 
|-
| Issue || The original building data provided is not "user-friendly". Tableau does not have difficulty reading it, but to plot the charts, it will pose a lot of issue. Tableau's default pivot function does not effectively transpose the columns into rows. Thus, there is a need to seek for non-Tableau solution/alternative.
+
| Inside Tableau, import <i>mc1-reports-data.csv</i> and <i>StHimark.shp</i> into Connections. Perform an full order join by using <i>location</i> in csv file and <i>Id</i> in shp file. <br>
|-
+
[[File:Data1-1.png|400px|center|Full outer join]]
| Solution || [[Image:Slide1.JPG|800px|center]]
+
This produces the following data columns in Tableau.
 +
[[File:Data1-2.png|800px|center|Data columns]]
 
|}
 
|}
  
{| class="wikitable"
+
=Visualisation Techniques=
|-
 
! Problem #2 || Building Data
 
|-
 
| Issue || The original data provided, concatenate information into the column itself. For example "F_1_Z_1:Lights Power", it is a column header by itself. The column header tells us that the reading is taken from floor 1, zone 1 and it is measuring the Lights Power. However, such information is only understood by humans instead of business intelligence software like Tableau. Therefore, there is a need for us to transform into a more "software-friendly" form.
 
|-
 
| Solution || [[Image:Slide2.JPG|800px|center]]
 
|}
 
  
{| class="wikitable"
+
<b>Online interactive visualization: https://public.tableau.com/profile/wang.xuze#!/vizhome/IS428AY2019-20T1AssignWangXuze/Home?publish=yes </b>
|-
 
! Problem #3 || Employee Proximity Card Data
 
|-
 
| Issue || The given images are rich in color to denote the various zones. However, to use it effectively as background for a choropleth map, the image should ideally be dull in color. For example, it should be in colors like gray. Furthermore, the zone boundaries are to be demarcated more obviously especially when it is transformed to color such as gray.
 
|-
 
| Solution || [[Image:Slide3.JPG|800px|center]]
 
|}
 
  
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Problem #4 || Employee Proximity Card Data
+
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 20%;" | Dashboard navigations
 
|-
 
|-
| Issue || The fixed proximity sensor collects data based on the zone which it is in. Unlikely the mobile proximity sensor which bases it detection on coordinates, the fixed sensor detects the cards within its designated zones. Thus, the only information which we are able to get from the fixed sensor is the time which the proximity card is present in the zone. To visualize the zones, we need to mark it out on the image map. Then with the polygon data, I can tell tableau where the zones are on the map. Thus, I need to plot the zones on the given image and retrieve the coordinates. The coordinates are to be saved in a mapping CSV file which will be processed by Tableau.
+
|
|-
+
The homepage is the landing page you will see when you use this Visualization tool. This homepage makes use of the Tableau Dashboard and its button functions to enable interactivity. <br>
| Solution || [[Image:Slide4.JPG|800px|center]]
+
[[File:Home xuze.png|400px|home page]]
 +
[[File:Overview xuze.png|400px|frameless|overview]]
 +
<br>
 
|}
 
|}
  
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Problem #5 || Employee Proximity Card Data
+
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 20%;" | Dynamic Sorting
|-
 
| Issue || The given employee data does not provide the prox-id. The proximity data provided by both the fixed and mobile sensors records only prox-id. Those sensors does not capture the name or any other characteristic of the employee. Therefore, there is a need to merge the employee data set and the proximity card data. However, to do so, I need to form the prox-id from the data available from the employee data. After initial observation, the formula for prox-id is as follows : "first name + first letter of last name + 001". However, this formula only works for the majority. There are some which do not obey the formula.
 
 
|-
 
|-
| Solution || [[Image:Slide6.JPG|800px|center]]
+
| <b>Description</b><br> To present the top neighborhoods with severe damages, I sort the damage level according to the facility specified.
|}
+
For example, when user selects <i>shake intensity</i>, the data will be in descending order according to the average damage reported about shake intensity.
  
{| class="wikitable"
+
[[File:Sorting xuze.png|600px|center|Dynamic sorting]]
 +
<br>
 
|-
 
|-
! Problem #6 || Employee Proximity Card Data
+
| <b>Technique</b><br>
|-
+
<ol>
| Issue || For the visualization, I would need to combine the data logically in Tableau. I would need to combine the files. After the initial combination, I would also need to merge these data with the employee dataset. However, as I am merging multiple files, there have to be common attributes. To ensure that the merge can be successful, I created a row id in the proximity data merged file. Id columns are also created in the individual fixed and mobile proximity files. With the ID, the integration is carried out easily, without confusion to Tableau. As we are trying to clean the data, there would be times where we made modification to the data. Thus, I need a column which would be indepedent of changes/modifications.
+
<li> Create a Parameter including the list of values we want the sorting to be based on
|-
+
[[File:Parameters.png|400px|center|Parameters]]
| Solution || [[Image:Slide7.JPG|800px|center]]
+
</li>
 +
<li> Create a Calculation Field matching the parameters with the Measures variables
 +
[[File:Calculation field.png|400px|center|Calculation field]]
 +
</li>
 +
<li> Show Parameter Control in the worksheet and now we are able to sort<br>
 +
[[File:Para sorting.png|100px|center|Para sorting]]
 +
</li>
 +
</ol>
 
|}
 
|}
  
<b><u>Final Excel Files</u></b>
+
=Question Answering=
<ol><li>bldg-MC2.csv</li>
 
Contains all the building related data (including hazium)
 
<li>bldg-MC2_mapping.csv</li>
 
Contains all the necessary mapping of the attributes in building data, so that Tableau can understand which floor and zone which the data was taken from.
 
<li>employee.csv</li>
 
All the employee related data.
 
<li>proxData_Merged.csv</li>
 
Contains the merged data from both fixed and mobile proximity sensor.
 
<li>proxMobileOut-MC2.csv</li>
 
Contains all the proximity card data that are recorded by the mobile sensor (Roise).
 
<li>proxOUt-MC2.csv</li>
 
Contains all the proximity card data that are recorded by the fixed sensor.
 
<li>proxOut-MC2_zoning_polygon.csv</li>
 
Contains all the polygon mapping of the zones for the fixed proximity sensor.
 
</ol>
 
  
=Data Import/Configuration=
+
<b>Online interactive visualization: https://public.tableau.com/profile/wang.xuze#!/vizhome/IS428AY2019-20T1AssignWangXuze/Home?publish=yes </b>
As we are importing multiple files, we need to tell tableau how the files are related to one another. In this case, the files do have a common attribute for all, such as its date/time. However, to allow us to use a filter from one data source to another data source, Tableau needs to understand how the files within the data source are related. For example, in your zone filter, you only want to display zone values which are available at the particular floor which was previously filtered by the user.
 
[[Image:Slide 8.jpg|800px|center]]<br>
 
<b>Brief Implementation Steps</b><br>
 
Once you open up the edit relationship dialog, like the image above. Based on the filter you want to use, choose the common attribute which is present in both datasets. Normally automatic mapping will suffice, however in our case, because of the complexity of our data, Tableau was unable to establish a meaningful relationship between the datasets. Thus, we have to do the custom mapping ourselves.
 
  
=Visualisation=
+
== Task 1==
The visualization is based on the category of the data. The breakdown of the proposed visualization is as shown below.
 
# Homepage
 
# Building Data Explorer : Air Supply Controls / Water Supply Controls / Fan Controls / Coil Controls / Additional System Controls
 
# Employee Movement Explorer
 
# Variable Explorer
 
The original dataset is overwhelming. There are over 400 different columns. To make the analysis more meaningful, the data columns has to be group logically based on the purpose of the sensor/data point. I have grouped the data into 6 different categories, namely;
 
# Air Supply data
 
# Water Supply data
 
# Fan data
 
# Coil data
 
# Additional System data
 
# Employee Proximity Card data
 
  
The design of the visualization is based on the "Overview first, zoom and filter, then details-on-demand" (Shneiderman, 1996). Thus when the user uses the tool, first he/she will be on the homepage [Step 1]. Through the homepage, it provides an overview of the data exploratory functions available. It provides a summary group of all the available data. The user will then choose the data of his/her interest and be redirected to it.<br>
+
<i><b>Emergency responders will base their initial response on the earthquake shake map. Use visual analytics to determine how their response should change based on damage reports from citizens on the ground. How would you prioritize neighborhoods for response? Which parts of the city are hardest hit?</i></b>
  
Once you are redirected at to the dashboard after [Step 1], you are at [Step 2] now. Basically, at this stage, you are looking at the data which you are interested in. You can interact with the data, by using the filters. Hovering on the data point will provide you more details on the data.<br>
+
Given the damage reports by citizens, the emergency responders could change their response accordingly. As far as I am concerned, they should prioritize the neighbourhoods when:
 +
<ol>
 +
<li>The average damage level reported is high</li>
 +
<li>The number of reports is large</li>
 +
<li>The damage to important facilities such as medical, roads and bridges, and buildings etc.</li>
 +
<li>The reports of high-level damage are recent</li> </ol>
  
If you are keen to find out more about the particular dataset/column, you can proceed to [Step 3] where you explore in finer details of the variable which you are looking at. At [Step 3] you proceed to the Variable Explorer. At this dashboard, you are given the ability to drill down the data into finer details. For example, breaking the data up by floor, zones, time etc.<br>
+
The rationale is that high level damage is more severe compared to lower damage and requires immediate response. Large number of reports would generally mean a more reliable situation on site; thus, the neighborhood should be quickly attended to. The damage to certain facilities requires more urgent attention such as medical facilities where they could be further damage to the patients; and roads and bridges where the transportation for rescue is blocked. Last but not least, they should always monitor the most recent reports and attend to those neighborhoods in time.
  
Alternatively, using the Employee Proximity Card data can be either in [Step 2] or [Step 3]. When you are at this dashboard, you can explore the locations of the employees and determine its correlation with the other dataset.
+
Therefore, I created visualizations to allow emergency responders to get the information through the following ways:
  
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 20%;" |  [Step 1] Homepage
+
! Serial No. !! Observation
 
|-
 
|-
| <b>Purpose / Description</b><br>The homepage is the landing page you will see when you use this Visualisation tool. The data explorery tools are all displayed on the homepage. This homepage makes use of the Tableau Dashboard and its action functions to enable interactivity. It is to serve as a "Home" panel for this visualisation and it would enable the user ease of navigation between the dashboards.
+
| 1 || This visualization allows emergency responders to view the top damaged neighbourhoods during any hour of any date. The damage level is the average of the reported levels during that hour. <br> As a way for emergency responders to access quickly which are the neighbourhoods that have a general high damage level to all facilities, an overall damage field is included by summing up average damage levels of all categories. <br> A sorting feature is provided for them to sort the neighbourhoods based on the damage level of certain facility if they deem it’s more important to firstly attend to those facilities. (addressed 1, 3 and 4) <br>
[[Image:Tan Kee Hock MA3 Slide9.JPG|800px|center]]<br>
+
They could sort it based on the <b>overall damage</b> during 10th hour on 8th April:
 +
[[File:1-1-1.png|800px|center|Sort by overall]]
 +
They could sort it by the facilities that they want to prioritize such as <b>medical</b>:
 +
[[File:Fig. 1-1.png|800px|frameless|centre|alt text]]
 
|-
 
|-
| <b>Interactive Technique</b><br>
+
| 2 || This visualization allows emergency responders to view the damaged neighbourhoods during any hour of any date, by different colour intensities representing the average damage level reported, according to the reports from citizens. A show damage for filter is provided so that we can choose which facilities’ damage we want to view. In this case, In Northwest during hour 17 on 6 April, there are 48 reports made, and the average medical damage is 8.5. Thus the responders might want to quickly attend to this neighbourhood first. (address 1, 2, 3, 4) <br>
<ol><li>Select : Pointer</li>In order for this homepage to be made possible, there are action rules specified for each of the icons.
+
[[File:1-2.png|800px|frameless|centre]]
[[Image:Tan Kee Hock MA3 Slide10.JPG|800px|center]]
 
<li>Select : Hover</li>
 
Tooltips are provided to allow the user to understand the action that are tagged to the icon.
 
[[Image:Tan Kee Hock MA3 Slide11.JPG|800px|center]]
 
</ol>
 
 
|}
 
|}
  
{| class="wikitable"
+
== Task 2==
|-
+
<i><b>Use visual analytics to show uncertainty in the data. Compare the reliability of neighborhood reports. Which neighborhoods are providing reliable reports? Provide a rationale for your response.</i></b>
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 20%;" | [Step 2] Building Data Explorer : Air Supply Controls / Water Supply Controls / Fan Controls / Coil Controls / Additional System Controls
+
 
|-
+
Since the visualizations prepared for question 1 are mostly using average values, it might be acceptable for emergency responders to gain immediate first-hand insights. However, when we display aggregated data like sum or average, we no longer have any visibility into the variance of the underlying data. Especially that our visualizations are based on crowdsourced data that may lack of reliability and the qualities may vary, since the damage level reports are totally based on citizens’ subjective opinions. The emergency responders need to be fully informed of such uncertainties to access the reliabilities of neighbourhood reports. Therefore, there are uncertainties in the data I would like to address.
| <b>Purpose / Description</b><br> The purpose of this dashboard is to give the user and overview of the data of the related controls. Within the HVAC system, there are a lot of intra-working sub-systems which help keep the entire HVAC system working. This dashboard groups all the related controls together and presents an overview of the data. This will allow the user to easier understand the sub-systems of the data. In general, I had grouped the date into 5 sub-systems:
 
# Air Supply Sub-System
 
# Water Supply Sub System
 
# Fan Sub--System
 
# Coil Sub-System
 
# Additional Sub-Systems
 
The dashboard is logically designed for to ease usability. The layout as shown below.
 
[[Image:Tan Kee Hock MA3 Slide12.JPG|800px|center]]<br>
 
The dashboard starts with the navigation bar right at the top, followed by the title and description. After which are the filters which are specific for the dashboard. The individual charts then follow. Within the charts itself, it is descriptive by nature. It has its title and this description of what the data is trying to measure.
 
|-
 
| <b>Interactive Technique</b><br>
 
<ol>
 
<li>Select : Hover</li>
 
When the user is interested in a specific data point, he/she can simply place the cursor over the data point. A tooltip will instantly appear with the relevant details. This is to provide the user a more granular level of detail.
 
[[Image:Tan Kee Hock MA3 Slide15.JPG|800px|center]]<br>
 
<li>Filter</li>
 
The filter at this dashboard is to allow the user to specify the data range at which the user is interested to find more about. For example, he/she wants to look at data that is specific to the month of June and Mondays only. He/She is allowed to do so. At this level, the critical filter is the date. This would give the user an overview of the data within the specific period. Once the filter has been set, the patterns of the individual charts can be seen more clearly.
 
[[Image:Tan Kee Hock MA3 Slide13.JPG|800px|center]]<br>
 
<li>Connect</li>
 
The order of the charts is important. As much as possible, relevant/related charts would be placed adjacent to each other. The scale would be adjusted as well to provide a clearer comparison.
 
[[Image:Tan Kee Hock MA3 Slide14.JPG|800px|center]]
 
</ol>
 
|-
 
| <b>Types of Charts used</b><br>
 
The data provided are readings taken from various HVAC/Proximity Sensors. Thus, all of the readings are taken against time. To do meaningful comparison and analysis with time as one of the dimension, I used mainly,
 
# Heatmap
 
# Line Chart
 
The image below is a representative of the type of charts used. It does not represent all the charts that are present in the dashboard.
 
[[Image:Tan Kee Hock MA3 Slide16.JPG|800px|center]]
 
|}
 
  
 +
These visualizations are provided to understand the uncertainty and reliability of neighbourhood reports:
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 20%;" | [Step 2/3] Employee Movement Explorer
+
! Serial No. !! Observation
 
|-
 
|-
| <b>Purpose / Description</b><br> The purpose of this dashboard is visualise the employee proximity card data. The data are given with X,Y coordinates. Thus, we can plot the data on a background image map which is provided in the original dataset. The proximity card data are visualized on the floor map itself. There are modifications to the floor map so that the data can be better visualized. Now that the employee's movements are visualized on an image map, it gives much higher clarity on the employee's movement/activities around the building.
+
| 1 || This is an overall heatmap showing the number of reports made by citizens hourly every day for each of the neighbourhoods. From this heatmap we can identify the frequency and number of reports made.
[[Image:Tan Kee Hock MA3 Slide17.JPG|800px|center]]
+
What’s more, there’s background knowledge that there are power outages happening in neighbourhoods like Old Town and Southwest due to Power Department’s work. This will cause delays in the receipt of reports.
|-
+
Certain abnormalities on the heatmap can be explained with additional information, such as that during  8th and 9th hours of 8th April, there were 2200 and 1713 reports made but no reports made for the following 15 hours, and sudden surge in report numbers happened during 1st hour of 9th April. This must be because of the power outage. This neighbourhood should be attended because of the significant number of reports made before the power outage happened.
| <b>Interactive Technique</b><br>
+
However, information like prolonged period in Scenic Vista without reports made requires more investigation.<br>
<ol>
+
[[File:2-1.png|700px|frameless|center|Number of reports heatmap]]
<li>Select : Hover</li>
 
When the user is interested in a specific data point, he/she can simply place the cursor over the data point. A tooltip will instantly appear with the relevant details. This is to provide the user a more granular level of detail.
 
[[Image:Tan Kee Hock MA3 Slide91.JPG|800px|center]]<br>
 
<li>Filter</li>
 
The filter at this dashboard is to allow the user to specify the data range at which the user is interested to find more about. Furthermore, the user can filter results based on the employee's department. This allows the user to easily understand the behavior of employees from the different departments. It also helped to show the interaction between each department. Along with the date filters, this would give the user an overview of the employee activity within the specified period. Once the filter has been set, the patterns of the individual charts can be seen more clearly.
 
[[Image:Tan Kee Hock MA3 Slide18.JPG|800px|center]]<br>
 
</ol>
 
|-
 
| <b>Types of Charts used</b><br>For this dashboard, much of the data are given based on the location itself. Thus, the data needs to be plotted on an image to effectively show the pattern between the employee's location and the time of the day. This will help to tell us what the employee's movement/activities are like.
 
# Bar chart
 
# Image Maps
 
# Choropleth Map
 
The image below is a representative of the type of charts used. It does not represent all the charts that are present in the dashboard.
 
[[Image:Tan Kee Hock MA3 Slide19.JPG|800px|center]]
 
[[Image:Tan Kee Hock MA3 Slide20.JPG|800px|center]]
 
|}
 
  
{| class="wikitable"
 
 
|-
 
|-
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 20%;" | [Step 3] Variable Explorer
+
| 2 || This visualization displays the damage level distribution reports by citizen about different facilities during a certain hour in each neighbourhood. Emergency responders could use this to access how much variations are in the different reports. <br>For example: In Broadview during 14th hour on 6th April, the medical damage reports vary a lot whereas roads and bridges damage reports vary little.
|-
+
[[File:2-2.png|700px|frameless|center|Hourly report distribution boxplot]]
| <b>Purpose / Description</b><br> Variable Explorer is to allow the user to further explore the data in more details. In the previous dashboards, especially for the controls, the level of detail is limited so that the analyst can see the bigger picture. In this dashboard, it is designed to empower the analyst to view more about the data and how it changes across floor, zones and time. This is to help the analyst understand how the readings varies across the mentioned building attributes and time. The aim of this dashboard is to focus on just one measurement and understand its pattern/behaviour.
 
[[Image:Tan Kee Hock MA3 Slide21.JPG|800px|center]]<br>
 
|-
 
| <b>Interactive Technique</b><br>
 
<ol>
 
<li>Select : Hover</li>
 
When the user is interested in a specific data point, he/she can simply place the cursor over the data point. A tooltip will instantly appear with the relevant details. This is to provide the user a more granular level of detail.
 
[[Image:Tan Kee Hock MA3 Slide92.JPG|800px|center]]<br>
 
<li>Filter</li>
 
There are additional filters to the fundamental date filter. The additional filters are for specifying the measurement variable, floor and zones. The analyst can choose the variable which he/she is interested to find out more about.
 
[[Image:Tan Kee Hock MA3 Slide22.JPG|800px|center]]<br>
 
</ol>
 
|-
 
| <b>Types of Charts used</b><br> The data all have one common attribute, which is date/time. Thus, to enable flexibility for the dashboard to handle all of the variable types, the dashboard is fundamentally be required to visualize time-related data. Therefore, the following types of charts are used.
 
# Heatmap
 
# Bar chart
 
# Line Chart
 
The image below is a representative of the type of charts used. It does not represent all the charts that are present in the dashboard.
 
[[Image:Tan Kee Hock MA3 Slide23.JPG|800px|center]]<br>
 
|}
 
  
=Use Case=
 
{| class="wikitable"
 
|-
 
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 20%;" |  Visualisation Tool Demonstration
 
|-
 
| <b>Scenario</b><br>There is a hardworking analyst who wants to explore for patterns with regards to the bathroom use in the building!
 
 
|-
 
|-
| <b>Steps</b><br>
+
| 3 || To access which neighbourhoods are providing reliable reports, I think that the neighbourhoods with higher number of reports and less variation in the data are more reliable. <br>
[[Image:Tan Kee Hock MA3 Slide24.JPG|800px|center]]<br>
+
Based on these two criteria, this visualization provides the standard deviation of the reported damages about a certain facility (building in this graph). Together with the number of reports during the hour, the emergency responders could decide whether the data is reliable. <br>
[[Image:Tan Kee Hock MA3 Slide25.JPG|800px|center]]<br>
+
For example, In Broadview, the reports during hour 1 with a standard deviation of 3.869 are not reliable compared to those during 9th hour with a standard deviation of 2.429.
[[Image:Tan Kee Hock MA3 Slide26.JPG|800px|center]]<br>
 
[[Image:Tan Kee Hock MA3 Slide27.JPG|800px|center]]<br>
 
[[Image:Tan Kee Hock MA3 Slide28.JPG|800px|center]]<br>
 
[[Image:Tan Kee Hock MA3 Slide29.JPG|800px|center]]<br>
 
[[Image:Tan Kee Hock MA3 Slide30.JPG|800px|center]]<br>
 
[[Image:Tan Kee Hock MA3 Slide31.JPG|800px|center]]<br>
 
[[Image:Tan Kee Hock MA3 Slide32.JPG|800px|center]]<br>
 
[[Image:Tan Kee Hock MA3 Slide33.JPG|800px|center]]<br>
 
[[Image:Tan Kee Hock MA3 Slide34.JPG|800px|center]]<br>
 
[[Image:Tan Kee Hock MA3 Slide35.JPG|800px|center]]<br>
 
[[Image:Tan Kee Hock MA3 Slide36.JPG|800px|center]]<br>
 
[[Image:Tan Kee Hock MA3 Slide37.JPG|800px|center]]<br>
 
[[Image:Tan Kee Hock MA3 Slide38.JPG|800px|center]]<br>
 
[[Image:Tan Kee Hock MA3 Slide39.JPG|800px|center]]<br>
 
|}
 
  
=Findings  - Task #1=
+
[[File:2-3.png|700px|frameless|center|Standard deviation of reported damages]]
 +
<br>
  
<i>Emergency responders will base their initial response on the earthquake shake map. Use visual analytics to determine how their response should change based on damage reports from citizens on the ground. How would you prioritize neighborhoods for response? Which parts of the city are hardest hit?</i>
 
  
Given the damage reports by citizens, the emergency responders could change their response accordingly. As far as I am concerned, they should prioritize the neighbourhoods that:
 
1. The average damage level reported is high
 
2. The number of reports is large
 
3. The damage to medical, roads and bridges, and buildings etc.
 
4. The reports of high-level damage are recent
 
The rationale is that high level damage is more severe compared to lower damage and requires immediate response. Large number of reports would generally mean a more reliable situation on site; thus, the neighborhood should be quickly attended to. The damage to certain facilities requires more urgent attention such as medical facilities where they could be further damage to the patients, and roads and bridges where the transportation for rescue is blocked. Last but not least, they should always monitor the most recent reports and attend to those neighborhoods in time.
 
 
Therefore, I created visualizations to allow emergency responders to get the information through the following ways:
 
 
{| class="wikitable"
 
|-
 
! Serial No. !! Observation
 
 
|-
 
|-
| 1 || This visualization allows emergency responders to view the top damaged neighbourhoods during any hour of any date. The damage level is the average of the reported levels during that hour. As a way for emergency responder to access quickly which are the neighbourhoods that have a general high damage level to all facilities, an overall damage field is included by summing up average damage levels of all categories. A sorting feature is provided for them to sort the neighbourhoods based on the damage level of certain facility if they deem it’s more important to firstly attend to those facilities. (addressed 1 and 3)
+
| 4 ||If we look at the entire view with all dates, we can see that some neighbourhoods have reports with small standard deviation mostly (lighter colour) like Broadview and Weston. <br> Whereas some others have more darker colour areas indication less reliable reports with large standard deviation, like Pepper Mill and Safe Town.
 +
[[File:2-4.png|700px|frameless|center|Overall standard deviation of reported damages]]
  
[[File:Fig. 1-1.png|800px|frameless|centre|alt text]]
 
|-
 
| 2 || This visualization allows emergency responders to view the damaged neighbourhoods during any hour of any date, by different colour intensities representing the average damage level reported, according to the reports from citizens. A show damage for filter is provided so that we can choose which facilities’ damage we want to view. In this case, In Northwest during hour 17 on 6 April, there are 48 reports made, and the average medical damage is 8.5. Thus the responders might want to quickly attend to this neighbourhood first. (address 1, 2, 3, 4)
 
[[File:1-2.png|800px|frameless|centre]]
 
  
|-
 
| 3 || [[Image:Tan Kee Hock MA3 Slide43.JPG|800px|center]]
 
|-
 
| 4 || [[Image:Tan Kee Hock MA3 Slide44.JPG|800px|center]]
 
|-
 
| 5 || [[Image:Tan Kee Hock MA3 Slide45.JPG|800px|center]]
 
|-
 
| 6 || [[Image:Tan Kee Hock MA3 Slide46.JPG|800px|center]]
 
|-
 
| 7 || [[Image:Tan Kee Hock MA3 Slide47.JPG|800px|center]]
 
|-
 
| 8 || [[Image:Slide48.JPG|800px|center]]
 
|-
 
| 9 || The offices of the employees are arranged by position. The higher position the employee is, it is likely that his/her office will be at the higher floor. The executive departments are mainly located on the 3rd floor, while people from the facility and security department comes from the 1st and 2nd floor.
 
|-
 
| 10 || Floor 2 is where the bulk of the employees are. Most of the employee's offices are on the 2nd floor. Although their offices are located on the 2nd floor, they still move about the building as frequently. Also, as seen in the floor map and the employee proximity card data, floor 1 is where meetings and front desk offices are located. Thus, the reduced employee presence in floor 1 also suggests that the meeting rooms in floor 1 are likely to be used to host guests/events
 
 
|}
 
|}
  
=Findings - Task #2=
+
== Task 3==
Describe up to ten of the most interesting patterns that appear in the building data. Describe what is notable about the pattern and explain its possible significance.
+
<b><i>How do conditions change over time? How does uncertainty in change over time? Describe the key changes you see. Limit your response to 500 words and 8 images.</i></b> <br>
{| class="wikitable"
+
With a time-series data set, we can visualize the changes and look for insights into our data. The changes in our case could be analysed based on:
|-
+
<ol><li>Change in number of reports </li>
! Serial !! Measurement Category !! Description and Significance
+
<li>Change in reported damage levels </li></ol>
|-
 
| 1 || Thermostat Setting || The general setting for the thermostat heating and cooling setpoints tend to be opposite of each other. When the heating set point is being set to a higher point, the cooling setpoint will be set to a lower point. This is normally because the user is trying to adjust the temperature of the air within the zone. Naturally, when you want the place to be cooler, you will set the heating point at a lower point, and the cooling point to be at a higher point. This is to produce an equilibrium temperature within the zones. You see that the temperature of the air is between the two setpoints.<br>
 
[[Image:Tan Kee Hock MA3 Slide50.JPG|800px|center]]<br>
 
However in the month of June, the period of 7th to 10th. The behavior of the thermostat setting seems to be off the norms. It betrays the general behaviour which is shown in the rest of the month. As the heating setpoint increases, the cooling setpoint increases as well. The general temperature of the air within the zones seems to increase significantly during mid-day. It peaks up as much as to 28.88°C. The average temperature of the air in the zones hovers around 24°C. This is approx. 4°C above the norm. The average temperature in Singapore, especially during the hottest month,February, is around 27°C. The observation here is definitely something worth investigating. The behavior is consistent throughout all the floors and its zone.<br>
 
[[Image:Tan Kee Hock MA3 Slide51.JPG|800px|center]]<br><br>
 
There are potential reasoning to this cause.
 
# Inappropriate handling of the thermostat controls
 
# Severe weather conditions - eg Extremely Cool/Hot Weather (Unlikely)
 
<b>Significance</b><br>
 
The thermostat settings are vital to ensure that the building is properly heated. If the temperature gets too high in the building, and without properly ventilation, will pose potential safety risks to the employees. If the temperature is unable to be regulated induce flavour working conditions, it will likely to cause not just unhappiness but health issues to the employees.
 
|-
 
| 2|| Mechanical Ventilation Mass Flow Rate || This measurement tells us how much air is flowing through the zone exhaust fan. In the month of June, in particular, there is some inconsistency for the readings on two particular weekends, namely 4th-5th June and 11th-12th June. In general, the readings of this specific measurement has its own cycle within the day. Naturally, it would be lower on the weekends. However, the 2 weekends in June, displays very different reading. The first weekend shows a reading that is below the average while the second weekend shows a reading that is significantly higher than the average.<br>
 
[[Image:Tan Kee Hock MA3 Slide53.JPG|800px|center]]<br>
 
You can also observe that the readings are consistent throughout the weekdays and weekends. During the weekday, the flow rate generally increases during mid-day (Possibly due to the hot weather). On the weekend the pattern is very different.<br>
 
<b>Significance</b><br>
 
The readings of the amount of air flowing through the zone exhaust fan can tell us if the building is well ventilated. It indicates the movement of air. Since this observation happens on the weekend, potentially the lack of human activity may be correlated to the lower flow rate. But the difference of flow rate in two separate weekends remains questionable. The flow rate indicates blockage and ventilation of the building. If there is build-up of dust/blockages or animal movement, the flow rate inevitably will be affected. A higher flow rate in the weekend without human activity can potentially indicate faulty sensors or errors in the equipment which results in abnormal control of the ventilation.
 
|-
 
| 3 || Bath_Exhaust:Fan Power || This is the measurement of the power used by the bathroom fans. The power indicates usage of the bathroom. There is consistent use of the bathroom throughout the weekday. On the weekend, especially Saturdays (4th and 11th), the usage drops drastically after 1600H.<br>
 
[[Image:Tan Kee Hock MA3 Slide55.JPG|800px|center]]<br><br>
 
<b>Significance</b><br>
 
The power usage indicates the use of the bathroom. You notice that during the weekday, the bathrooms are consistently used at a similar rate. As explained by the consistent color throughout the working day. This reading tells us the employee's movement and activity of a typical day in the company. The consistent use of bathrooms, indicate human activity in the building as well. Furthermore, it can be used to indicate the employee's productivity, if there are potential cases of "slacking off"/"malingering".
 
|-
 
| 4 || Dry Bulb Temperature || The dry-bulb temperature (DBT) is the temperature of air measured by a thermometer freely exposed to the air but shielded from radiation and moisture. DBT is the temperature that is usually thought of as air temperature, and it is the true thermodynamic temperature. Thus, this reading tells us the relative weather condition of outside of the building.<br>
 
[[Image:Tan Kee Hock MA3 Slide57.JPG|800px|center]]<br>
 
As shown in the picture, the readings are very consistent throughout the month of June, you can see that the temperature generally goes up during noon. This reading strongly correlates to the time of the day. Generally, you would expect the temperature to go up during mid-day.<br>
 
<b>Significance</b><br>
 
The dry bulb temperature is essential for the HVAC system, as the reading can be used to evaluate the effectiveness of the HVAC system within the building. We can  measure how effective the HVAC system is, in regulating the internal building temperature.
 
|-
 
| 5 || Lights/Pump/Equipment Power|| The readings from all three power consumers, namely lights, pump and equipment display very health power consumption. Their power consumptions are very consistent throughout the month. Light and Equipment power generally peaks up during the weekday. During the weekend, you can see a significant drop in the power consumption. However, for the pump, the power it consumed is a constant number. Either it could be efficiently used, or potentially there is a faulty sensor which causes this reading. Constant reading of 91W can be suspicious.
 
[[Image:Tan Kee Hock MA3 Slide94.JPG|800px|center]]<br>
 
|-
 
| 6 || Water Heater Setpoint & Loop Temp Schedule || The loop temperature schedule refers to the temperature set for the hot water loop. This is the temperature at which hot water is delivered to hot water appliances and fixtures. The temperature for both readings were at a constant value throughout all the month. Both are set at the temperature of 60.0 degree celsius.
 
|-
 
| 7 || Supply Side Inlet Temperature || This reading measures the temperature of the water entering the hot water tank. The readings intensified as the temperature increases especially on the weekend. The water going into the hot water tank is generally higher during the weekend then compared to the weekday. This is worth investigating as there are lesser human activities over the weekend. The system could be boiling the water unnecessarily, thus, wasting energy.
 
[[Image:Tan Kee Hock MA3 Slide95.JPG|800px|center]]<br>
 
|-
 
| 8 || Lights Power || Despite the consistent total Lights power consumption, there is some interesting pattern to it. Lights power in the first floor is generally not turned off. Much of the power consumption comes from the 1st floor. Even past working hours, the 1st floor still consumes significantly high power, while the rest of the floors' consumption dropped to their minimal level. What is more surprising is that the zones, 8A, 8B, and 11B reflect the lights consumed in corridors. It appears that the building is not really energy efficient after all!
 
[[Image:Tan Kee Hock MA3 Slide96.JPG|800px|center]]<br>
 
|-
 
| 9 || Total Electric Power Demand || The new building claims to be of the highest energy efficiency standards, however, there are questionable data points which do not accurately reflects the energy efficiency capability. The total electric power demand peaks up and intensify on a particular weekend in June (10th - 13th). It begins from Friday morning, and intensify all the way till the following morning. After which, the demand for electric power drops. This is an interesting finding as there should be lower employee activities during the weekends.
 
[[Image:Tan Kee Hock MA3 Slide97.JPG|800px|center]]<br>
 
|}
 
 
 
=Findings - Task #3=
 
Describe up to ten notable anomalies or unusual events you see in the data. Prioritize those issues that are most likely to represent a danger or a serious issue for building operations.
 
  
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Priority !! Measurement Category !! Description and Significance
+
! Serial No. !! Observation
|-
 
| 1 || Hazium Concentration || Hazium is a recently discovered and possibly dangerous chemical. It poses health hazards to the employees whom inhales it. There are spikes in Hazium concentration especially on 3rd (Friday) and 11th (Saturday) June. What is more surprising is that, one of the areas with high concentration is coming from office 3000(CEO's office).  <br><b>Signifiance</b><br> As mentioned in the background text, hazium is a dangerous chemical. High concentration of haizum is likely to pose health issues to employees. No one can explain the effects of hazium, but it was concluded to likely be a dangerous chemical to employee. Therefore, it is crucial for the company to look for the root course and address it.
 
|-
 
| 2 || Return Outlet CO2 Concentration  || This reading tells us the CO2 concentration within the building. The healthy co2 concentration ranges from 250ppm to 1000pm. However on 2 conservative days (6th and 8th of June), the CO2 concentration spike above 1800 ppm. <br><b>Signifiance</b><br> High concentration of CO2 within the building would post health hazard to the employee. PPM reading above 1000, the employees would experience drowsiness. As it reaches above 2000, employees will experience headaches, sleepiness and stagnant, stale, stuffy air. Poor concentration, loss of attention, increased heart rate and slight nausea. It is vital for the company to look investigate the high CO2 concentration.
 
[[Image:Tan Kee Hock MA3 Slide93.JPG|800px|center]]
 
|-
 
| 3 || Thermostat Setting || This finding is as per one which was mentioned in above in Task #2, the malfunction of this Thermostat would be devastating. <br><b>Signifiance</b><br> The thermostat is responsible for regulating and maintain the internal temperature of the building. You can effectively say that, the readings from the thermostat would control the temperature of the building. There have been instances of it peaking up. The high temperature may potentially cause health hazard for the employee
 
|-
 
| 4 || VAV_SYS Supply Fan Outlet Mass Flow Rate || This reading tells us the total rate of air delivered by the HVAC system fan to the zone it serves. The data collected in the month of June is not showing consistent results.<br>
 
The readings do tally with the VAV_Sys Supply Fan Outlet:Power.<br>
 
[[Image:Tan Kee Hock MA3 Slide59.JPG|800px|center]]
 
The readings intensify in 2 particular periods, 7th-8th June and 10th-13th June. During 7th-8th June (Tuesday to Wednesday), the reading intensifies in the early hours and late night. This is an abnormal phenomenon. This is telling us that more air is being delivered by the HVAC system fan when there is no supposed employee during this period. The second period, 10th-13th June, shows intensified readings consistently from 10th June evening to 13th June Morning (Friday to Monday). <br>
 
<b>Signifiance</b><br>
 
The readings do not seem to tally with the supposed work shifts of employees. There seems to be an increased flow of air during the period where no one supposed to be there. There are multiple possibilities which may have caused such data readings.
 
# Faulty Sensors causing false readings (Unlikely)
 
# Faulty Equipment
 
This reading is important because it will indicate the overall system health of the HVAC fans. It tells us if the HVAC fans are working harder. It also indicates if the HVAC system's ability to maintain the building's internal temperature/ventilation.
 
 
|-
 
|-
| 5 || Deli-Fan Power || This reading tells us the power used by the deli exhaust fan. There are some suspicious data points with regards to the use of Deli-Fan.<br>
+
| 1 || This visualization presents a line graph showing how the hourly number of reports vary for each neighbourhood. We can filter by neighbourhoods and dates and display only those we want to take a deeper look into. <br>
[[Image:Slide61.JPG|800px|center]]<br>
+
This graph shows all neighbourhoods' number of reports change over the period. We can see that, there’s high increase in citizen reports on 8 April when the major quake started, from hour 7 to hour 10, where the number of reports start to decrease. There are occasionally unusual spikes and they are explained by power outage.  
The fan usage seems to be consistently high during a Sunday(5th and 12th June). The readings do not seem to tally with the increased human activities during the weekday. The inconsistent readings do not seem to establish any form of correlation with the human activity. But rather, the pattern of seem to be established by other unknown factors.<br>
+
[[File:3-1.png|800px|center|reports count change]]
<b>Signifiance</b><br>
+
Using filter to zoom into one particular neighboorhood can provide us with more detailed information. This graph shows the change in Downtown. We could see that citizens started to report information about fore quakes in hour 14 on 6th April; major quakes in hour 7 on 8th April and after quakes in hour 13 on 9th April.
Exhaust fans are health indicators of the overall HVAC systems. Should the exhaust fans power usage display sporadic patterns, they indicate abnormalities within the HVAC system. Furthermore, they help to regulate the airflow for the HVAC system. The poor performance of Exhaust fans will significantly hamper the HVAC's ability to regulate internal building temperature.
+
[[File:3-1-1.png|800px|center|Downtown]]
 
|-
 
|-
| 6 || VAV_SYS Heating Coil Power || There is completely 0 power used for the heating coil. This is entirely not possible as the HVAC system seem to be working properly. Thus, there is very little prove that the Heating Coil is broken/faulty.<br>
+
| 2 || This visualization presents a line graph showing how the hourly average damage reported varies in a neighbourhood. <br>
[[Image:Tan Kee Hock MA3 Slide62.JPG|800px|center]]<br>
+
For example when we look at Easton, we can see that the damage reported for buildings and power are high since hour 0, whereas the damage for roads and bridges and sewer and water started to increase an hour later. This is probably because that the former can be more directly felt by the shaking of the building, the power outage, but the latter could not be felt after a while.
<b>Signifiance</b><br>
+
[[File:3-2.png|800px|center|Reported damage change]]
This is very likely to be a faulty Power Usage sensor. Although this reading does not seem to affect the rest of the system, an investigation in the faulty sensor is recommended. If there are external forces in play which results in the faulty sensor, then it is very likely this cause will impact other parts of the HVAC system. For example, water leakage in a specific part of the building which caused the sensor to be spolit, etc.
 
|-
 
| 7 || VAV_SYS Supply Fan:Fan Power || The system supply fan consumes more power on the weekend (both Saturday and Sunday). This is highly unusual as there is lower employee activity within the building. Most of the power comes from the fans in level 3. On Saturday it is a half day, but on Sunday only those who are on shift would be in the building. Therefore, on Sunday, there would be close to zero human activity. <br><b>Signifiance</b><br> The supply fan is responsible for circulating the air within the HVAC system. In this case, the unnecessary power consumed by the fan would incur additional cost to the company. Not only that, it is a waste of energy.
 
 
|}
 
|}
  
=Findings - Task #4=
+
=Future Improvement=
Describe up to five observed relationships between the proximity card data and building data elements. If you find a causal relationship (for example, a building event or condition leading to personnel behavior changes or personnel activity leading to building operations changes), describe your discovered cause and effect, the evidence you found to support it, and your level of confidence in your assessment of the relationship.
+
Given more time, I will improving on the visualizations by including more statistical methods and reasoning to demonstrate the data uncertainty and reliability. I will also work on improving the interface for the emergency responders and provide them with a much easier and clearer view. Nonetheless, through this assignment, I have learned a lot about interpreting the data, visualization techniques and my analytical ability.
  
{| class="wikitable"
 
|-
 
! Serial || Discovery
 
|-
 
| 1 ||
 
[[Image:Tan Kee Hock MA3 Slide63.JPG|800px|center]]
 
[[Image:Tan Kee Hock MA3 Slide64.JPG|800px|center]]
 
[[Image:Tan Kee Hock MA3 Slide65.JPG|800px|center]]
 
[[Image:Tan Kee Hock MA3 Slide66.JPG|800px|center]]
 
<br>
 
Thus, the haizum concentration does not occur by chance. It is very likely that someone orchestrated the event. All the clues point towards someone who is likely to be from level 3. More investigation is needed. The attack is very likely to be directed to the CEO himself.
 
|}
 
 
=Conclusion=
 
There are many interesting findings which do not reflect the energy efficiency ability which the builders had claimed to be. The new building does not seem to be as energy-efficient as what was previously advertised. As for the occurrence of Hazium, it is postulated to be caused by the employee themselves. The evidence points towards a deliberate attack towards the CEO himself. As Hazium is a newly discovered chemical, its potential impact on the employees is unknown. Many cautious steps should be taken when investigating the Hazium outbreak. Evident suggest that the culprit seem to be an employee from level 3!<br>
 
<b>Main Link</b>
 
One tough assignment down, one more project to remaining - https://public.tableau.com/views/MA_3_Final/Home?:embed=y&:display_count=yes
 
<br>
 
<b>Backup Link</b>
 
This is one tough assignment,I need more backup link - https://public.tableau.com/views/MA_3_0/Home?:embed=y&:display_count=yes
 
 
=Improvement=
 
Given more time, i would focus on improving drilling capability of the this visualisation tool. I would also work on improving the interface for the Employee Movement Explorer. But nonetheless, it was a tough fight against time and my analyatical ability. I am still glad manage to generate something like that.
 
 
=Visualisation Software=
 
=Visualisation Software=
  
To perform the visual analysis, this is a list of the software which I used.
+
To perform the visual analysis, this are the softwares I used.
*Tableau
+
*Tableau Desktop
 +
*Tableau Public Server (My work: https://public.tableau.com/profile/wang.xuze#!/vizhome/IS428AY2019-20T1AssignWangXuze/Home?publish=yes)
 
*Excel
 
*Excel
*Chrome
+
*R Studio
*Netbeans
 
 
 
=Submission details=
 
 
 
This is an individual assignment. You are required to work on the assignment and prepare submission individually. Your completed assignment is due on '''24th October 2016, by 12.00 noon'''.
 
 
 
You need to edit your assignment in the appropriate wiki page of the Assignment Dropbox. The title of the wiki page should be in the form of: IS428_2016-17_T1_Assign3_FullName.
 
 
 
The assignment 3 wiki page should include the URL link to the web-based interactive data visualization system prepared.
 
 
 
 
 
=Assignment 3 Q&A=
 
 
 
Need more clarification, please feel free to pen down your questions.
 
 
 
#What is Hazium? Hazium is a (fictitious) chemical that has become a recent concern on the island of Kronos. Not much is known about its effects, but it is suspected that Hazium is not good for people.
 
#There are a few extra building file data fields in the .json dataset that do not appear in the .csv data. These extra data fields are actually valid for the building for the dates and times they were recorded, but they will not add significantly to your analysis. So for this assignment, please just use the data fields included in the .csv file.
 
#Can you provide more info on the data provided in the mobile proximity card data? Are the x,y coordinates bound to a normal (x,y) plane, where in this case the plane is the floor maps? The (x,y) coordinates are bound to a normal plane. The (x,y) plus the floor number would identify a specific location. The lower left of the provided map is (0,0) and the upper right is (189,111).
 
#In some cases, data is reported for some sensors and not others, or it is documented but not reported. Where can we find this data? Please use the data fields you have available to perform your investigation. In general, the documented set of attributes may not be reported for all zones.
 
#What does the (x,y) coordinates represent for the mobile robot sensor? The (x,y) coordinates for these reading represent the location of the mobile sensor.
 
#Sometimes, mobile prox data for a prox card repeats multiple times in a minute. Does this indicate the number of seconds that the prox card was within range of the sensor? No. Multiple readings do not indicate what fraction of the minute that the mobile sensor was in proximity of the prox card.
 
#In some cases, the value of the VAV Availability Manager Night Cycle On/Off is 2. Is this a valid value? Yes.
 
#Does F_3_Z_9 VAV Damper Position mean F_3_Z_9 VAV REHEAT Damper Position? Yes.
 
  
 
=References=
 
=References=
* http://www.picturetopeople.org/image_utilities/image-grayscale-converter/grayscale-image-generator.html
+
* Dynamic Sorting with Tableau
* https://community.tableau.com/message/320738
+
https://www.clearlyandsimply.com/clearly_and_simply/2011/11/dynamic-sorting-with-tableau.html
* http://www.thedataschool.co.uk/niccolo-cirone/tableau-tip-week-wednesday-creating-dashboard-navigator-buttons/
+
* Using Tableau to Show Variance and Uncertainty
* http://kb.tableau.com/articles/howto/renaming-dimension-column-row-headers
+
https://www.rittmanmead.com/blog/2017/06/using-tableau-to-show-variance-and-uncertainty/
* https://tableauandbehold.com/2015/04/13/creating-custom-polygons-on-a-background-image/
+
* How to Creat Heat Map in Tableau
* https://www.kane.co.uk/knowledge-centre/what-are-safe-levels-of-co-and-co2-in-rooms
+
https://www.youtube.com/watch?v=Tc8VenUN4n8
* https://en.wikipedia.org/wiki/HVAC
+
* Analyzing Time Series
 +
https://www.youtube.com/watch?v=aaaILjNPHSs
  
 
=Comments=
 
=Comments=
Do provide me your feedback!:)
+
I appreciate all suggestions and discussions!
 +
Please provide feedback thank you! :)

Latest revision as of 23:58, 13 October 2019

Mini-Challenge 1: Crowdsourcing for Situational Awareness


Problem Statement

St. Himark has been hit by an earthquake, leaving officials scrambling to determine the extent of the damage and dispatch limited resources to the areas in most need. They quickly receive seismic readings and use those for an initial deployment but realize they need more information to make sure they have a realistic understanding of the true conditions throughout the city.

In a prescient move of community engagement, the city had released a new damage reporting mobile application shortly before the earthquake. This app allows citizens to provide more timely information to the city to help them understand damage and prioritize their response. In this mini-challenge, use app responses in conjunction with shake maps of the earthquake strength to identify areas of concern and advise emergency planners. Note: the shake maps are from April 6 and April 8 respectively.

With emergency services stretched thin, officials are relying on citizens to provide them with much needed information about the effects of the quake to help focus recovery efforts.

By combining seismic readings of the quake, responses from the app, and background knowledge of the city, help the city triage their efforts for rescue and recovery.

Tasks

  1. Emergency responders will base their initial response on the earthquake shake map. Use visual analytics to determine how their response should change based on damage reports from citizens on the ground. How would you prioritize neighborhoods for response? Which parts of the city are hardest hit? Limit your response to 1000 words and 10 images.
  2. Use visual analytics to show uncertainty in the data. Compare the reliability of neighborhood reports. Which neighborhoods are providing reliable reports? Provide a rationale for your response. Limit your response to 1000 words and 10 images.
  3. How do conditions change over time? How does uncertainty in change over time? Describe the key changes you see. Limit your response to 500 words and 8 images.

Motivations

  1. Provide clear overview of the citizen reports to aid decision making.
  2. Inform the uncertainty and reliability of the citizen reports.
  3. Show how conditions change over time.
  4. Allow effective emergency response to save life.

Data Description

The data includes

  1. A mc1-reports-data.csv file spanning the entire length of the event, containing (categorical) individual reports of shaking/damage by neighborhood over time. It has these fields:
    • time: timestamp of incoming report/record, in the format YYYY-MM-DD hh:mm:ss
    • location: id of neighborhood where person reporting is feeling the shaking and/or seeing the damage
    • {shake_intensity, sewer_and_water, power, roads_and_bridges, medical, buildings}: reported categorical value of how violent the shaking was/how bad the damage was (0 - lowest, 10 - highest; missing data allowed)
  2. Two shakemap PNG files which indicate where the corresponding earthquakes' epicenters originate as well as how much shaking can be felt across the city.
  3. The StHimark.shp provides the geospatial vector data format St. Himark.

Data Preparation

Join the reports data and Shapefile
Inside Tableau, import mc1-reports-data.csv and StHimark.shp into Connections. Perform an full order join by using location in csv file and Id in shp file.
Full outer join

This produces the following data columns in Tableau.

Data columns

Visualisation Techniques

Online interactive visualization: https://public.tableau.com/profile/wang.xuze#!/vizhome/IS428AY2019-20T1AssignWangXuze/Home?publish=yes

Dashboard navigations

The homepage is the landing page you will see when you use this Visualization tool. This homepage makes use of the Tableau Dashboard and its button functions to enable interactivity.
home page overview

Dynamic Sorting
Description
To present the top neighborhoods with severe damages, I sort the damage level according to the facility specified.

For example, when user selects shake intensity, the data will be in descending order according to the average damage reported about shake intensity.

Dynamic sorting


Technique
  1. Create a Parameter including the list of values we want the sorting to be based on
    Parameters
  2. Create a Calculation Field matching the parameters with the Measures variables
    Calculation field
  3. Show Parameter Control in the worksheet and now we are able to sort
    Para sorting

Question Answering

Online interactive visualization: https://public.tableau.com/profile/wang.xuze#!/vizhome/IS428AY2019-20T1AssignWangXuze/Home?publish=yes

Task 1

Emergency responders will base their initial response on the earthquake shake map. Use visual analytics to determine how their response should change based on damage reports from citizens on the ground. How would you prioritize neighborhoods for response? Which parts of the city are hardest hit?

Given the damage reports by citizens, the emergency responders could change their response accordingly. As far as I am concerned, they should prioritize the neighbourhoods when:

  1. The average damage level reported is high
  2. The number of reports is large
  3. The damage to important facilities such as medical, roads and bridges, and buildings etc.
  4. The reports of high-level damage are recent

The rationale is that high level damage is more severe compared to lower damage and requires immediate response. Large number of reports would generally mean a more reliable situation on site; thus, the neighborhood should be quickly attended to. The damage to certain facilities requires more urgent attention such as medical facilities where they could be further damage to the patients; and roads and bridges where the transportation for rescue is blocked. Last but not least, they should always monitor the most recent reports and attend to those neighborhoods in time.

Therefore, I created visualizations to allow emergency responders to get the information through the following ways:

Serial No. Observation
1 This visualization allows emergency responders to view the top damaged neighbourhoods during any hour of any date. The damage level is the average of the reported levels during that hour.
As a way for emergency responders to access quickly which are the neighbourhoods that have a general high damage level to all facilities, an overall damage field is included by summing up average damage levels of all categories.
A sorting feature is provided for them to sort the neighbourhoods based on the damage level of certain facility if they deem it’s more important to firstly attend to those facilities. (addressed 1, 3 and 4)

They could sort it based on the overall damage during 10th hour on 8th April:

Sort by overall

They could sort it by the facilities that they want to prioritize such as medical:

alt text
2 This visualization allows emergency responders to view the damaged neighbourhoods during any hour of any date, by different colour intensities representing the average damage level reported, according to the reports from citizens. A show damage for filter is provided so that we can choose which facilities’ damage we want to view. In this case, In Northwest during hour 17 on 6 April, there are 48 reports made, and the average medical damage is 8.5. Thus the responders might want to quickly attend to this neighbourhood first. (address 1, 2, 3, 4)
1-2.png

Task 2

Use visual analytics to show uncertainty in the data. Compare the reliability of neighborhood reports. Which neighborhoods are providing reliable reports? Provide a rationale for your response.

Since the visualizations prepared for question 1 are mostly using average values, it might be acceptable for emergency responders to gain immediate first-hand insights. However, when we display aggregated data like sum or average, we no longer have any visibility into the variance of the underlying data. Especially that our visualizations are based on crowdsourced data that may lack of reliability and the qualities may vary, since the damage level reports are totally based on citizens’ subjective opinions. The emergency responders need to be fully informed of such uncertainties to access the reliabilities of neighbourhood reports. Therefore, there are uncertainties in the data I would like to address.

These visualizations are provided to understand the uncertainty and reliability of neighbourhood reports:

Serial No. Observation
1 This is an overall heatmap showing the number of reports made by citizens hourly every day for each of the neighbourhoods. From this heatmap we can identify the frequency and number of reports made.

What’s more, there’s background knowledge that there are power outages happening in neighbourhoods like Old Town and Southwest due to Power Department’s work. This will cause delays in the receipt of reports. Certain abnormalities on the heatmap can be explained with additional information, such as that during 8th and 9th hours of 8th April, there were 2200 and 1713 reports made but no reports made for the following 15 hours, and sudden surge in report numbers happened during 1st hour of 9th April. This must be because of the power outage. This neighbourhood should be attended because of the significant number of reports made before the power outage happened. However, information like prolonged period in Scenic Vista without reports made requires more investigation.

Number of reports heatmap
2 This visualization displays the damage level distribution reports by citizen about different facilities during a certain hour in each neighbourhood. Emergency responders could use this to access how much variations are in the different reports.
For example: In Broadview during 14th hour on 6th April, the medical damage reports vary a lot whereas roads and bridges damage reports vary little.
Hourly report distribution boxplot
3 To access which neighbourhoods are providing reliable reports, I think that the neighbourhoods with higher number of reports and less variation in the data are more reliable.

Based on these two criteria, this visualization provides the standard deviation of the reported damages about a certain facility (building in this graph). Together with the number of reports during the hour, the emergency responders could decide whether the data is reliable.
For example, In Broadview, the reports during hour 1 with a standard deviation of 3.869 are not reliable compared to those during 9th hour with a standard deviation of 2.429.

Standard deviation of reported damages



4 If we look at the entire view with all dates, we can see that some neighbourhoods have reports with small standard deviation mostly (lighter colour) like Broadview and Weston.
Whereas some others have more darker colour areas indication less reliable reports with large standard deviation, like Pepper Mill and Safe Town.
Overall standard deviation of reported damages


Task 3

How do conditions change over time? How does uncertainty in change over time? Describe the key changes you see. Limit your response to 500 words and 8 images.
With a time-series data set, we can visualize the changes and look for insights into our data. The changes in our case could be analysed based on:

  1. Change in number of reports
  2. Change in reported damage levels
Serial No. Observation
1 This visualization presents a line graph showing how the hourly number of reports vary for each neighbourhood. We can filter by neighbourhoods and dates and display only those we want to take a deeper look into.

This graph shows all neighbourhoods' number of reports change over the period. We can see that, there’s high increase in citizen reports on 8 April when the major quake started, from hour 7 to hour 10, where the number of reports start to decrease. There are occasionally unusual spikes and they are explained by power outage.

reports count change

Using filter to zoom into one particular neighboorhood can provide us with more detailed information. This graph shows the change in Downtown. We could see that citizens started to report information about fore quakes in hour 14 on 6th April; major quakes in hour 7 on 8th April and after quakes in hour 13 on 9th April.

Downtown
2 This visualization presents a line graph showing how the hourly average damage reported varies in a neighbourhood.

For example when we look at Easton, we can see that the damage reported for buildings and power are high since hour 0, whereas the damage for roads and bridges and sewer and water started to increase an hour later. This is probably because that the former can be more directly felt by the shaking of the building, the power outage, but the latter could not be felt after a while.

Reported damage change

Future Improvement

Given more time, I will improving on the visualizations by including more statistical methods and reasoning to demonstrate the data uncertainty and reliability. I will also work on improving the interface for the emergency responders and provide them with a much easier and clearer view. Nonetheless, through this assignment, I have learned a lot about interpreting the data, visualization techniques and my analytical ability.

Visualisation Software

To perform the visual analysis, this are the softwares I used.

References

  • Dynamic Sorting with Tableau

https://www.clearlyandsimply.com/clearly_and_simply/2011/11/dynamic-sorting-with-tableau.html

  • Using Tableau to Show Variance and Uncertainty

https://www.rittmanmead.com/blog/2017/06/using-tableau-to-show-variance-and-uncertainty/

  • How to Creat Heat Map in Tableau

https://www.youtube.com/watch?v=Tc8VenUN4n8

  • Analyzing Time Series

https://www.youtube.com/watch?v=aaaILjNPHSs

Comments

I appreciate all suggestions and discussions! Please provide feedback thank you! :)