IS428 2019-1920 T1 Assign LeeSunho Data Preparation

From Visual Analytics for Business Intelligence
Revision as of 23:16, 13 October 2019 by Sunho.lee.2017 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
-
Data Exploration & Transformation Process

Problem & Motivation


Data Exploration & Transformation Process

Interactive Visualization



The dataset zip file

the dataset that was given were


  • AnswerSheet Folder
  • MC1-DataDescription ( Word document )
  • mc1-majorquake-shakemap ( PNG File )
  • mc1-prequake-shakemap ( PNG File)
  • mc1-reports-data ( Excel document )
  • VAST 2019 - St. Himark - About Our City ( Word document )



Data Cleaning

Data 1.png

mc1-reports-data includes

Details of the data that includes

1. 83700 datas

2. Time (d/m/yyyy h:mm)

  • Time: 6/4/2020 12:00:00 am to 11/4/2020 12:00:00 am

3. sewer_and_water (number)

  • sewer_and_water: 0 to 10 with blanks

4. power (number)

  • power: 0 to 10

5. roads_and_bridges (number)

  • roads_and_bridges: 0 to 10

6. medical (number)

  • medical: 0 to 10 with blanks

7. buildings (number)

  • buildings: 0 to 10 with blanks

8. shake_intensity (number)

  • shake_intensity: 0 to 9 with blanks

9. location (number)

  • 1 to 19


Issue

1.null data

Out of 83700, 59925 data are null data.

2.Intensity level is not specified into group.

Group.jpg

As the reported earthquake intensities are on a scale of 1-10, it is important for users to first understand what these numbers actually mean. Above is a table which succinctly summarizes the scale, explaining that a 1 is basically a non-existent earthquake while a 10 is a catastrophic one. Furthermore, an intensity greater than 6 signifies that the earthquake is strong enough to cause damage.


3.The data does not have longitude and latitudes hence there is issue with plotting map.

Solution

1. The reason why there are null data might be because the data is lively collected, it is possible the report is not properly recorded, or they might purposely not show it. In order to further analysis, i did not remove the null data.

2. Group the intensity level

Data 2 group.png

I group the intensity level based on the chart shown earlier. Hence, user can identify seriousness of the reported value.


3. In order to plot Map chart, I have downloaded mini2 case data’s “StHimark.shp” file using Tableau which will allow me to visualize map. In order to solve

Data1 4.png

I used tableau to combine two data sets. Both data set shared common data column, "location" and "Id"

Data .png

I was able to get name of the each region, longitude and latitude. This will allow me to visualize the chart in more accurate way based on the region.


Photo 2019-10-13 23-09-31.jpg

This is how the map will look.


4. In order to visualize different classes ( ex) power, sewer and water..etc), I combined all the classes into two column. Hence, this will allow me to filter and visualize easily.

Data 5.png

combining the data allow me to compare different classes at the same rate.

Pivoting for categories

Issue

The categories are represented in columns, which is difficult when performing the visualization in tableau.

Solution

Pivoting the categories - Medical, Power, Road And Bridges, Sewer And Water and shake intensity are pivoted to enable easier filtering in tableau.

Pivoting.png
Binning of different intensity level

For easier visualization, binning of intensity level is the suitable approach for users. We can easily be confused by the intensity level value as it is not clearly stated what each value represents

First, we add step to clean the data and click "create calculated field"

Create Calculated Field.png

Here is the code:

Intensity level category.PNG

This is the finalized dataset.

Photo 2019-10-13 23-09-05.jpg