Difference between revisions of "ISSS608 2017-18 T1 Assign ZHENG MIANYI"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 1: Line 1:
{{Template:RaymHeader}}
+
[[File:Background.jpg|1150px|center]]
__NOTOC__
 
  
<div style="position:relative; top:-50px; margin:auto; width:900px; background-color:#e6ecf0; border: thin solid #A6A6A6;" >
 
<div style="width:900px; background-color:#c0c0c0; height: 35px; border-bottom: 1px solid #9e9e9e; border-top: 1px solid #9e9e9e;">
 
  <table style="width:900px;">
 
    <tr>
 
      <td width="40%" style="color:#ffffff;"></td>
 
      <td width="20%" style="color:#ffffff;text-align:center;font-size: 18px;">By Zheng Mianyi</td>
 
    </tr>
 
  </table>
 
</div>
 
<br \>
 
  
<div style="color:#0F1940; padding-left: 10px; font-size: 24px; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif;">Background</div>
+
==Background==
<hr style="margin-left:10px;margin-right:10px">
+
An epidemic disease broke out in a major metropolitan area, Smartpolis. With provided information such as the city population, disease symptoms, both geographical map and weather of the city and most importantly: microblogs of the residents, I made every efforts to detect the transmission of this disease.
<div style="padding-left: 10px; color:#0F1940; font-size: 16px; font-weight: bold; font-family: 'Courier New', Courier, monospace">
 
<p>An epidemic disease broke out in a major metropolitan area, Smartpolis. With provided information such as the city population, disease symptoms, both geographical map and weather of the city and most importantly: microblogs of the residents, I made every efforts to detect the transmission of this disease.</p>
 
</div>
 
  
 +
==Preparation==
 +
The initial dataset put the latitude and longitude data together, and the main information is contains in more than 1 million microblogs records. Hence, I separated the geographical digit to two columns, namely latitude and longitude.
  
<div style="color:#0F1940; padding-left: 10px; font-size: 24px; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif;">Data Preparation</div>
 
<hr style="margin-left:10px;margin-right:10px">
 
<div style="padding-left: 10px; color:#0F1940; font-size: 16px; font-weight: bold; font-family: 'Courier New', Courier, monospace">
 
<p>The initial dataset put the latitude and longitude data together, and the main information is contains in more than 1 million microblogs records. Hence, I separated the geographical digit to two columns, namely latitude and longitude. </p>
 
 
<br \>
 
<br \>
<p><b>Subsequently, I chose the key words to select the relevant information. Personally, I prefer a relatively small dataset with higher accuracy rather than a large dataset with lower accuracy. After many trials, I set the target words as:"fever", "chill", "fatigue", "cough", "difficult", "nausea", "vomit", "diarrhea", "lymph" and "throat".</b></p>
+
 
 +
Subsequently, I chose the key words to select the relevant information. Personally, I prefer a relatively small dataset with higher accuracy rather than a large dataset with lower accuracy. After many trials, I set the target words as:"fever", "chill", "fatigue", "cough", "difficult", "nausea", "vomit", "diarrhea", "lymph" and "throat".
 +
 
 
<br \>
 
<br \>
 +
 
Last but not lease, I attempted to explore more information. For instant, is there any initial symptoms before the patients becoming ill? In addition, after viewing the symptoms, we can initially group them into two main problems: flu (those with fever, chills, fatigue, coughing, breathing difficulty, sore throat and enlarged lymph nodes) and stomach problem (those with nausea, vomiting, diarrhea). All these two type of problems I stored them in "Type" column. In terms of symptoms, for those patient who suffered two kinds or above, i created the additional rows to store them in the "Symptom" column. (e.g. one record like " I got fever and my throat is on fire." will be recorded twice with "fever" tag and "sore throat" tag respectively.)
 
Last but not lease, I attempted to explore more information. For instant, is there any initial symptoms before the patients becoming ill? In addition, after viewing the symptoms, we can initially group them into two main problems: flu (those with fever, chills, fatigue, coughing, breathing difficulty, sore throat and enlarged lymph nodes) and stomach problem (those with nausea, vomiting, diarrhea). All these two type of problems I stored them in "Type" column. In terms of symptoms, for those patient who suffered two kinds or above, i created the additional rows to store them in the "Symptom" column. (e.g. one record like " I got fever and my throat is on fire." will be recorded twice with "fever" tag and "sore throat" tag respectively.)
</div>
+
[[File:Prepared Dataset.png|700px|center]]

Revision as of 14:22, 15 October 2017

Background.jpg


Background

An epidemic disease broke out in a major metropolitan area, Smartpolis. With provided information such as the city population, disease symptoms, both geographical map and weather of the city and most importantly: microblogs of the residents, I made every efforts to detect the transmission of this disease.

Preparation

The initial dataset put the latitude and longitude data together, and the main information is contains in more than 1 million microblogs records. Hence, I separated the geographical digit to two columns, namely latitude and longitude.


Subsequently, I chose the key words to select the relevant information. Personally, I prefer a relatively small dataset with higher accuracy rather than a large dataset with lower accuracy. After many trials, I set the target words as:"fever", "chill", "fatigue", "cough", "difficult", "nausea", "vomit", "diarrhea", "lymph" and "throat".


Last but not lease, I attempted to explore more information. For instant, is there any initial symptoms before the patients becoming ill? In addition, after viewing the symptoms, we can initially group them into two main problems: flu (those with fever, chills, fatigue, coughing, breathing difficulty, sore throat and enlarged lymph nodes) and stomach problem (those with nausea, vomiting, diarrhea). All these two type of problems I stored them in "Type" column. In terms of symptoms, for those patient who suffered two kinds or above, i created the additional rows to store them in the "Symptom" column. (e.g. one record like " I got fever and my throat is on fire." will be recorded twice with "fever" tag and "sore throat" tag respectively.)

Prepared Dataset.png