ISSS608 2017-18 T1 Assign WANG RUI Investigation step1

From Visual Analytics and Applications
Revision as of 23:58, 15 October 2017 by Ruiwang.2016 (talk | contribs)
Jump to navigation Jump to search
Header.png

Mission

Information

Investigation

Insights

 


5 Steps to the Truth:
Step #1: Understand the data.


First, let's look at the microblog message dataset:
“ID” is identifier of person who post the message. There are 73,928 users identified which can be used to label person of interest in the following steps.
“Created_at” provides the date and time when message was created. 21 days of records are given which covers from April 31st to May 20th. Some dirty data with wrong timestamp are identified as shown below:

Wrclean1.png

21 (out of 1,023,077) Dirty data with wrong time records. Since they are only 0.002% and what they talk about is not related to disease. They are removed from microblog data. After this cleaning process, there are 1,023,056 message records left.


“Location” indicates the latitude and longitude of location where messages are posted. To utilize this location information and map onto dashboard, the combined latitude and longitude figures are separated into two columns. (Longitude is transformed to negative values because of West longitude). The progress is shown below:

Wrclean2.png


Second, let's look at the weather dataset:
“Wind Speed” and “Wind Direction” are very useful as supporting evidences of whether the disease is airborne transmitted. To make the analysis more intuitive, shaped icons are implemented for better visualization.

Wrwind2.png


“Weather” 8 days out 21 are sunny with clear sky.

Wrweather2.png


Now, let's look at the population information:
“Population Density” and “Daytime Population”

Wrpop.png

Westside, Downtown and Uptown are the 3 zones with Daytime population much higher than population density. These areas are marked as the working zones. Suburbia, Eastside, Lakeside and Southville in the Eastern part of Smartpolis carry 50% of total population. These zones are marked as residental zones. There are 30% more residents living on the eastside of river.


Last, there are some additional information:
“Water Supply” where residents and business get drinking water are by pumping water from nearby reservoirs or rivers. The Vast river flows south at a steady rate of three miles per hour. This is very important for proving whether the disease is waterborne transmitted.