ISSS608 2017-18 T1 Assign WANG RUI Investigation step1
|
|
|
|
|
First, let's look at the microblog message dataset:
“ID” is identifier of person who post the message. There are 73,928 users identified which can be used to label person of interest in the following steps.
“Created_at” provides the date and time when message was created. 21 days of records are given which covers from April 31st to May 20th.
Some dirty data with wrong timestamp are identified as shown below:
21 (out of 1,023,077) Dirty data with wrong time records. Since they are only 0.002% and what they talk about is not related to disease. They are removed from microblog data. After this cleaning process, there are 1,023,056 message records left.
“Location” indicates the latitude and longitude of location where messages are posted. To utilize this location information and map onto dashboard, the combined latitude and longitude figures are separated into two columns. (Longitude is transformed to negative values because of West longitude). The progress is shown below:
Second, let's look at the weather dataset:
“Wind Speed” and “Wind Direction” are very useful as supporting evidences of whether the disease is airborne transmitted. To make the analysis more intuitive, shaped icons are implemented for better visualization.
“Weather”
Now, let's look at the population information:
“Population”