Difference between revisions of "ISSS608 2017-18 T1 Assign ZHANG Lidan Data Preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
<div style=background:#2B3820 border:#A3BFB1>
 
<div style=background:#2B3820 border:#A3BFB1>
 
[[Image:title_momo.png|150px]]  
 
[[Image:title_momo.png|150px]]  
<font size = 5; color="#FFFFFF">To be a Visual Detective</font>
+
<font size = 5; color="#FFFFFF">Epidemic Spread in Smartpolis</font>
 
</div>
 
</div>
 
<!--MAIN HEADER -->
 
<!--MAIN HEADER -->
Line 24: Line 24:
 
<br/>
 
<br/>
 
==Data Preparation==
 
==Data Preparation==
To better deal with the data, I import the microblog data set into the JMP at first. This dataset contains a lot of useful information. For example, I can use the location axis and the timestamp to identify where these rows are located. Then, through tokenizing and stemming the words in each message, I can filter the high frequency words and flulike-related keywords for further data exploration.
+
* Import the microblog data set into the JMP 13<br/>
The microblogs dataset contains 1,023,077 rows.
+
* Exclude 48 rows of missing text<br/>
Firstly, I need to separate the location into longitude and latitude. Then, because these locations are at the western, hemisphere, I should reverse the longitude coordinates into negative value.
+
[[File:Missingdata.PNG|400px]]
Next, to exclude the irrelevant information, I create the subset dataset which consists of main flulike symptoms, such as chill, flu, fever, sweat, pain, fatigue, ache, cough, breath, nausea, vomit, diarrhea. Here, I use the Text Explorer in JMP to generate these new columns.
+
* Separate the location into longitude and latitude through Word function<br/>
[[File:1.png|600px|center]]
+
[[File:Location.PNG|300px]]
Next, I create the bar chart to display the frequency of microblogs including the symptom words. From this table, it can be noticeable that there is a sharply increase in the frequency from May 18 to May 20, 2011.
+
* Because these locations are at the western hemisphere, change the longitude coordinates into negative value by Num function<br/>
[[File:2.png|1000px|center]]
+
[[File:locationr.PNG|100px]]
Aiming to explore what happens from May 18 to May 20, I decide to reload the microblog dataset into JMP. Through observing the words in the text, I find the words are not only related to flulike symptoms, but also related to stomach problems. Then, I generate one dataset contains flulike symptoms like breath, cough, fatigue, fever, flu, and pneumonia, another dataset contains stomach ache symptoms like diarrhea, nausea, stomach and vomit.
+
* In addition, to exclude the irrelevant information, I create the subset dataset which consists of main flulike symptoms, such as '''''chill, flu, fever, sweat, pain, fatigue, ache, cough, breath, nausea, vomit, diarrhea'''''. Here, I use the Text Explorer in JMP to generate these new columns.
 +
[[File:wordget.PNG|500px]]<br/><br/>
 +
[[File:Symptoms.PNG|600px]]

Latest revision as of 15:25, 16 October 2017

Title momo.png Epidemic Spread in Smartpolis

Background

Data Preparation

Data Visualization

Conclusion

 


Data Preparation

  • Import the microblog data set into the JMP 13
  • Exclude 48 rows of missing text

Missingdata.PNG

  • Separate the location into longitude and latitude through Word function

Location.PNG

  • Because these locations are at the western hemisphere, change the longitude coordinates into negative value by Num function

Locationr.PNG

  • In addition, to exclude the irrelevant information, I create the subset dataset which consists of main flulike symptoms, such as chill, flu, fever, sweat, pain, fatigue, ache, cough, breath, nausea, vomit, diarrhea. Here, I use the Text Explorer in JMP to generate these new columns.

Wordget.PNG

Symptoms.PNG