Difference between revisions of "IS428 AY2019-20T1 Assign Nurul Khairina Binte Abdul Kadir Data"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
 
(96 intermediate revisions by the same user not shown)
Line 34: Line 34:
 
<!--MAIN HEADER -->
 
<!--MAIN HEADER -->
 
__FORCETOC__
 
__FORCETOC__
== Data Description ==
 
 
The first step in the transformation process begins with understanding and interpreting the data to determine which data type we currently have and what we need to transform it into.
 
 
The data zip file consists of 1 CSV file (mc1-reports-data.csv) spanning the entire length of the event from 6 April 2020, 12 AM to 11 April 2020, 12 AM and 2 shakemap images. The CSV file contains the individual reports (categorical) of shaking/damage by neighborhood over time.  Reports are made by citizens at any time through the Rumble mobile app. However, they are only recorded in 5-minute batches/increments due to the server configuration.  Furthermore, delays in the receipt of reports may occur during power outages.
 
 
'''Data Attributes of CSV File'''
 
 
{| class="wikitable" style="text-align: center  style="width: 100%";
 
|-
 
! Data Attributes !! Description
 
|-
 
| Time || Timestamp of incoming report/record, in the format YYYY-MM-DD hh:mm:ss
 
|-
 
| Location || ID of neighborhood where person reporting is feeling the shaking and/or seeing the damage
 
|-
 
| Shake Intensity, Sewer and Water, Power, Roads and Bridges, Medical, Buildings || Reported categorical value of how violent the shaking was/how bad the damage was (0 - lowest, 10 - highest; missing data allowed)
 
|}
 
 
'''Shakemap Images, Shape File and Map Images'''
 
 
Also included are two shakemap (PNG) files which indicate where the corresponding earthquakes' epicenters originate as well as how much shaking can be felt across the city. The StHimark shape file and map images which are obtained from the dataset of Mini Challenge 2 is used to create the map of the city.
 
  
 
== Dataset Analysis & Transformation Process ==
 
== Dataset Analysis & Transformation Process ==
Line 61: Line 39:
 
The following section illustrates the issues faced in the data analysis phase leading to a need to transform the data into a specified format. Tableau Prep will be used to clean and prepare the data for analysis. The CSV file is converted to Excel in order to import it into Tableau Prep.
 
The following section illustrates the issues faced in the data analysis phase leading to a need to transform the data into a specified format. Tableau Prep will be used to clean and prepare the data for analysis. The CSV file is converted to Excel in order to import it into Tableau Prep.
  
=== Data Transformation for MC1-Report-Data ===
+
=== Data Transformation of MC1-Report-Data ===
  
 
There are 3 main issues with the given dataset and data transformation is needed to reshape the data for easier analysis.  
 
There are 3 main issues with the given dataset and data transformation is needed to reshape the data for easier analysis.  
 +
 +
1. Inability to filter by damage type <br>
 +
2. Neighbourhood coordinates <br>
 +
3. Relationship between shake intensity and other damage types
  
 
==== Pivot the Damage Types ====
 
==== Pivot the Damage Types ====
Line 88: Line 70:
  
 
==== Neighbourhood Coordinates ====
 
==== Neighbourhood Coordinates ====
 +
 +
This is a continuation of the previous step.
  
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
| Issue || A map will provide the emergency responders with a better idea of the extent of the damage in the various neighbourhoods in real-time. There are different ways to add a map in Tableau. The easiest way is to import the StHimark.shp file into Tableau. However, in some cases, it is clearer to map the points onto a background image instead of a Tableau map.  <br>
+
| Issue || The data zip file from Mini-Challenge 2 includes a blank PNG image of St. Himark. However, there are no coordinate points (latitude and longitude) to identify each neighbourhood in the dataset and the background image will not load since the X and Y coordinates are missing.  A map will provide the emergency responders with a better idea of the extent of the damage in the various neighbourhoods in real-time.
 
 
The data zip file from Mini-Challenge 2 includes a blank PNG image of St. Himark. However, there are no coordinate points (latitude and longitude) to identify each neighbourhood in the dataset and the background image will not load since the X and Y coordinates are missing.  
 
  
 
|-
 
|-
| Solution || One of the ways to get the coordinates of each neighbourhood is by using the shapefile. There are tools available online to get the coordinates and this is the manual way. Once the shapefile is imported into Tableau and a map is created in a worksheet, the latitude and longitude will be generated automatically. Display the coordinates of the map and use this as a reference. In Tableau Prep, create 2 calculated fields for Neighbourhood Latitude and Neighbood Longitude for each neighbourhood.
+
| Solution || One of the ways to get the coordinates of each neighbourhood is by using the shapefile. Once the shapefile is imported into Tableau and a map is created in a worksheet, the latitude and longitude will be generated automatically. Display the coordinates of the map and use this as a reference. In Tableau Prep, create 2 calculated fields for Neighbourhood Latitude and Neighbood Longitude for each neighbourhood.
  
 
|-
 
|-
Line 112: Line 94:
  
 
[[File:Map Coordinates Using Shape File.jpg|center|400px]]
 
[[File:Map Coordinates Using Shape File.jpg|center|400px]]
 +
<center>Map Coordinates</center>
 
<br>
 
<br>
 
'''PART B: Add Columns for Neighbourhood Latitude and Longitude Into The Dataset '''
 
'''PART B: Add Columns for Neighbourhood Latitude and Longitude Into The Dataset '''
  
1. Add a Clean step in the flow <br>
+
1. Use the same flow from the previous step in 1.1.1. Add a Clean step in the flow <br>
 
2. Create a calculated field called Neighbourhood Latitude. Refer to the coordinates displayed on the map above. <br>
 
2. Create a calculated field called Neighbourhood Latitude. Refer to the coordinates displayed on the map above. <br>
 
3. Create a calculated field called Neighbood Longitude. Refer to the coordinates displayed on the map above. <br>
 
3. Create a calculated field called Neighbood Longitude. Refer to the coordinates displayed on the map above. <br>
Line 122: Line 105:
  
 
[[File:Calculated Field for Coordinates.jpg|500px|center]]
 
[[File:Calculated Field for Coordinates.jpg|500px|center]]
<br>
+
<center>Calculated Fields in Tableau Prep</center> <br>
 +
 
 +
|-
 +
| Screenshot ||
 +
[[File:Addition of 2 new columns.jpg|500px|center]]
 +
<center>Addition of Coordinates into Dataset - Continuation from 1.1.1 </center> <br>
 +
|}
 +
 
 +
==== Addition of Columns In Transformed Dataset ====
 +
 
 +
The following columns are added into the transformed dataset in 1.1.2 using Tableau Prep. This can be done at the Clean Step in the flow. With reference from the background information document, calculated fields are created for the following fields:
 +
 
 +
{| class="wikitable"
 +
|-
 +
! Field !! Description
 +
|-
 +
| Amenities || There are 4 possibilities. A neighborhood can have a school, hospital, hospital, and school or no amenities.
 +
|-
 +
| 1st Hospital Name || The name of the hospital in the neighbourhood / nearest hospital
 +
|-
 +
| Specialization || The specialization of each hospital
 +
|}
 +
 
 +
This is a sample of the final dataset known as '''MC1-Main''' which will be loaded into Tableau. <br>
 +
 
 +
[[File:MC1-Main.jpg|820px|center]]
 +
<CENTER>MC1-Main Dataset To Be Imported Into Tableau</CENTER>
 +
 
 +
==== Relationship Between Shake Intensity and Other Damage Types ====
 +
 
 +
{| class="wikitable"
 +
|-
 +
| Issue || It is difficult to see the relationship between shake intensity and other types in the given dataset and the transformed dataset above. It is important to analyse this relationship since anomalies in the data can be detected. For example, if a city is greatly affected by the earthquake based on the shakemap and have a high shake intensity impact score, it will be unusual if the buildings/medical/power etc. is unsually low. Further investigation will be needed. <br>
 +
|-
 +
| Solution || Using the given dataset, pivot the following columns: Buildings, Medical, Power, Roads and Bridges and Sewer and Water. Shake intensity should not be included here. Export the data. A new data source have to be created in Tableau.
 +
 
 +
|-
 +
| Steps Taken ||
 +
 
 +
1. Use the same flow file from the previous step. Add a Pivot step. Drag the columns mentioned above into Pivot values.<br>
 +
2. Rename the Pivot names and values. The new pivoted column name will be called 'Damage Type (Pivoted)' and the pivoted values will be called 'Impact Score (Pivoted)'. <br>
 +
3. Add Cleaning and Output steps. <br>
 +
 
 +
This dataset will be known as '''MC1-Relationship''' and will be loaded into Tableau.
 +
 
 +
The final flow in Tableau Prep will be as follows: <br>
 +
 
 +
[[File:Flow with Description.jpg|center|700px]]
 +
<center>Final Prep Flow</center> <br>
  
 
|-  
 
|-  
 
| Screenshot ||  
 
| Screenshot ||  
[[File:Pivot_Transformation.jpg|center|820px]]
+
[[File:Relationship between Shake and Other Damage Types.jpg|820px|center]]
<center>Before and After</center>
+
<center>Before and After: MC1-Relationship Dataset To Be Imported Into Tableau </center> <br>
 
|}
 
|}
 +
 +
=== New Spreadsheet on Maintainance Information ===
 +
 +
This will be the 3rd and final dataset that will be loaded into Tableau. It is known as the '''MC1-Maintainance Description''' dataset.
 +
 +
For some neighbourhoods, there are some maintainance projects going on that will result in traffic delays/power outages etc. Therefore, this information have to be considered in our analysis and it can be used in evaluating the reliability of the reports.
 +
 +
The Maintainance Description dataset is created in Excel and the values were added manually based on the information provided in the About Our City document.
 +
 +
The dataset is as follows:
 +
 +
[[File:Maintainance Description.jpg|750px|center]]
 +
<CENTER> MC1-Maintainance Description To Be Imported In Tableau </CENTER>
  
 
== Dataset Import Structure & Process ==
 
== Dataset Import Structure & Process ==
 +
 +
With the dataset analysis and transformation phase completed, the following files will have to be imported into Tableau for analysis. There will be 2 data sources.
 +
 +
=== Overview of Data Sources ===
 +
'''Data Source 1
 +
'''
 +
<br>
 +
Add a connection in Tableau for the following files:
 +
 +
1. MC1-Main.csv - Damage types are pivoted. Neighbourhood coordinates and additional information (amenities, hospital info) are added. Formatted from Data Transformation, as mentioned in 1.1.1, 1.1.2 and 1.1.3 <br>
 +
2. MC1-Maintainance-Description.csv - Spreadsheet is created manually in Excel <br>
 +
3. StHimark.Shp - Shapefile is obtained from MC2 Data file <br>
 +
 +
'''Data Source 2
 +
'''
 +
 +
4. MC1-Relationship.csv - Relationship between shake intensity and  other damage types
 +
Formatted from Data Transformation, as described in 1.1.4
 +
 +
'''Diagram
 +
'''
 +
 +
[[File:Data Sources for Tableau.jpg|center|1000px]]
 +
<CENTER> Data Sources To Be Used In Tableau </CENTER>
 +
 +
=== Data Connection ===
 +
==== Data Source 1 ====
 +
After the files have been added as a connection in Tableau, do the following steps: <br>
 +
 +
1. Import MC1-Main.csv as a data source.<br>
 +
2. Drag MC1-Maintainance-Description.csv from the left panel to the top right panel. <br>
 +
3. Perform a left join between MC1-Main.csv and MC1-Maintainance-Description.csv to link the maintainance descriptions to the respective neighbourhoods. The join conditions will be the Damage Type and Location. <br>
 +
 +
[[File:Data Source 1- Main and Maintainance Description.jpg|center|650px]]
 +
<CENTER> Left Join Between Main and Maintainance Dataset </CENTER>
 +
 +
Next, the shapefile will be added into the data source.
 +
 +
5. Drag StHimark.Shp from the left panel to the top right panel.  <br>
 +
6. Perform a inner join between MC1-Main.csv and StHimark.Shp in order to create the map. The join conditions will be the MC1-Main.csv Location field and the StHimark.Shp Id field. <br>
 +
 +
The final data source will be as follows:
 +
 +
[[File:Data Source 1.jpg|center|700px]]
 +
<CENTER> Inner Join Between Main and Shapefile </CENTER>
 +
 +
==== Data Source 2 ====
 +
 +
As for Data Source 2, it is very straightforward. Drag the MC1-Relationship.csv file from the left panel to the top right panel.
 +
 +
[[File:Data Source 2.jpg|center]]
 +
<CENTER> Data Source 2 </CENTER>
 +
 +
The data transformation process is now complete! Before we go into the data analysis, we will do information gathering to have a better idea of what to expect.

Latest revision as of 17:10, 12 October 2019


VAST 2019 MC1: Crowdsourcing for Situational Awareness

Introduction

Data Analysis and Transformation

Interactive Visualization

Task Findings

References

 



Dataset Analysis & Transformation Process

The following section illustrates the issues faced in the data analysis phase leading to a need to transform the data into a specified format. Tableau Prep will be used to clean and prepare the data for analysis. The CSV file is converted to Excel in order to import it into Tableau Prep.

Data Transformation of MC1-Report-Data

There are 3 main issues with the given dataset and data transformation is needed to reshape the data for easier analysis.

1. Inability to filter by damage type
2. Neighbourhood coordinates
3. Relationship between shake intensity and other damage types

Pivot the Damage Types

Issue Inability to filter by damage type
Solution Pivot the following columns - Shake Intensity, Sewer and Water, Power, Roads and Bridges, Medical and Buildings. Pivoting the data will create more rows for each time and location. This will provide users with the ability to filter the data based on the damage type which is needed for further analysis.
Steps Taken

1. Connect to the data source
2. Add a Clean step to review data
3. Add the Pivot step and drag the columns as mentioned above into Pivot values.
4. Rename the Pivot names and values. The new pivoted column name/category will be called 'Damage Type' and the pivoted values will be called 'Impact Score'. The flow will be as follows:

Pivot Damage Type.jpg
Tableau Prep Flow
Screenshot
Pivot Transformation.jpg
Before and After

Neighbourhood Coordinates

This is a continuation of the previous step.

Issue The data zip file from Mini-Challenge 2 includes a blank PNG image of St. Himark. However, there are no coordinate points (latitude and longitude) to identify each neighbourhood in the dataset and the background image will not load since the X and Y coordinates are missing. A map will provide the emergency responders with a better idea of the extent of the damage in the various neighbourhoods in real-time.
Solution One of the ways to get the coordinates of each neighbourhood is by using the shapefile. Once the shapefile is imported into Tableau and a map is created in a worksheet, the latitude and longitude will be generated automatically. Display the coordinates of the map and use this as a reference. In Tableau Prep, create 2 calculated fields for Neighbourhood Latitude and Neighbood Longitude for each neighbourhood.
Steps Taken

PART A: Working with Shapefiles

1. Go to Connections. Add a Spatial file. Select the StHimark.shp file.
2. Join the shapefile with the given dataset by doing an inner join. The join condition will be the dataset's Location field and the shapefile's Id field.
3. Drag the Geometry field into the Detail and Neighbourhood into Text of the Marks card.
4. Drag the Latitude (generated) and Longitude (generated) fields into Text.
5. Drag the Neighbourhood into Colours and filter by neighbourhood to have a clearer view since the coordinates may overlap.

The coordinates of each neighborhood will be displayed on the map as follows:

Map Coordinates Using Shape File.jpg
Map Coordinates


PART B: Add Columns for Neighbourhood Latitude and Longitude Into The Dataset

1. Use the same flow from the previous step in 1.1.1. Add a Clean step in the flow
2. Create a calculated field called Neighbourhood Latitude. Refer to the coordinates displayed on the map above.
3. Create a calculated field called Neighbood Longitude. Refer to the coordinates displayed on the map above.

The calculated field formula will be as follows:

Calculated Field for Coordinates.jpg
Calculated Fields in Tableau Prep

Screenshot
Addition of 2 new columns.jpg
Addition of Coordinates into Dataset - Continuation from 1.1.1

Addition of Columns In Transformed Dataset

The following columns are added into the transformed dataset in 1.1.2 using Tableau Prep. This can be done at the Clean Step in the flow. With reference from the background information document, calculated fields are created for the following fields:

Field Description
Amenities There are 4 possibilities. A neighborhood can have a school, hospital, hospital, and school or no amenities.
1st Hospital Name The name of the hospital in the neighbourhood / nearest hospital
Specialization The specialization of each hospital

This is a sample of the final dataset known as MC1-Main which will be loaded into Tableau.

MC1-Main.jpg
MC1-Main Dataset To Be Imported Into Tableau

Relationship Between Shake Intensity and Other Damage Types

Issue It is difficult to see the relationship between shake intensity and other types in the given dataset and the transformed dataset above. It is important to analyse this relationship since anomalies in the data can be detected. For example, if a city is greatly affected by the earthquake based on the shakemap and have a high shake intensity impact score, it will be unusual if the buildings/medical/power etc. is unsually low. Further investigation will be needed.
Solution Using the given dataset, pivot the following columns: Buildings, Medical, Power, Roads and Bridges and Sewer and Water. Shake intensity should not be included here. Export the data. A new data source have to be created in Tableau.
Steps Taken

1. Use the same flow file from the previous step. Add a Pivot step. Drag the columns mentioned above into Pivot values.
2. Rename the Pivot names and values. The new pivoted column name will be called 'Damage Type (Pivoted)' and the pivoted values will be called 'Impact Score (Pivoted)'.
3. Add Cleaning and Output steps.

This dataset will be known as MC1-Relationship and will be loaded into Tableau.

The final flow in Tableau Prep will be as follows:

Flow with Description.jpg
Final Prep Flow

Screenshot
Relationship between Shake and Other Damage Types.jpg
Before and After: MC1-Relationship Dataset To Be Imported Into Tableau

New Spreadsheet on Maintainance Information

This will be the 3rd and final dataset that will be loaded into Tableau. It is known as the MC1-Maintainance Description dataset.

For some neighbourhoods, there are some maintainance projects going on that will result in traffic delays/power outages etc. Therefore, this information have to be considered in our analysis and it can be used in evaluating the reliability of the reports.

The Maintainance Description dataset is created in Excel and the values were added manually based on the information provided in the About Our City document.

The dataset is as follows:

Maintainance Description.jpg
MC1-Maintainance Description To Be Imported In Tableau

Dataset Import Structure & Process

With the dataset analysis and transformation phase completed, the following files will have to be imported into Tableau for analysis. There will be 2 data sources.

Overview of Data Sources

Data Source 1
Add a connection in Tableau for the following files:

1. MC1-Main.csv - Damage types are pivoted. Neighbourhood coordinates and additional information (amenities, hospital info) are added. Formatted from Data Transformation, as mentioned in 1.1.1, 1.1.2 and 1.1.3
2. MC1-Maintainance-Description.csv - Spreadsheet is created manually in Excel
3. StHimark.Shp - Shapefile is obtained from MC2 Data file

Data Source 2

4. MC1-Relationship.csv - Relationship between shake intensity and other damage types Formatted from Data Transformation, as described in 1.1.4

Diagram

Data Sources for Tableau.jpg
Data Sources To Be Used In Tableau

Data Connection

Data Source 1

After the files have been added as a connection in Tableau, do the following steps:

1. Import MC1-Main.csv as a data source.
2. Drag MC1-Maintainance-Description.csv from the left panel to the top right panel.
3. Perform a left join between MC1-Main.csv and MC1-Maintainance-Description.csv to link the maintainance descriptions to the respective neighbourhoods. The join conditions will be the Damage Type and Location.

Data Source 1- Main and Maintainance Description.jpg
Left Join Between Main and Maintainance Dataset

Next, the shapefile will be added into the data source.

5. Drag StHimark.Shp from the left panel to the top right panel.
6. Perform a inner join between MC1-Main.csv and StHimark.Shp in order to create the map. The join conditions will be the MC1-Main.csv Location field and the StHimark.Shp Id field.

The final data source will be as follows:

Data Source 1.jpg
Inner Join Between Main and Shapefile

Data Source 2

As for Data Source 2, it is very straightforward. Drag the MC1-Relationship.csv file from the left panel to the top right panel.

Data Source 2.jpg
Data Source 2

The data transformation process is now complete! Before we go into the data analysis, we will do information gathering to have a better idea of what to expect.