Difference between revisions of "ISSS608 2017-18 T1 Assign ZHENG MIANYI"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 16: Line 16:
 
<hr style="margin-left:10px;margin-right:10px">
 
<hr style="margin-left:10px;margin-right:10px">
 
<div style="padding-left: 10px; color:#0F1940; font-size: 16px; font-weight: bold; font-family: 'Courier New', Courier, monospace">
 
<div style="padding-left: 10px; color:#0F1940; font-size: 16px; font-weight: bold; font-family: 'Courier New', Courier, monospace">
<p>A major metropolitan area, Smartpolis, has a dramatic increase in reported illnesses. I have just been tasked to analyse the origin and epidemic spread of the disease. More importantly I am also given the responsibility to find the mode of transmission of the disease and propose any containment measure.</p>
+
<p>An epidemic disease broke out in a major metropolitan area, Smartpolis. With provided information such as the city population, disease symptoms, both geographical map and weather of the city and most importantly: microblogs of the residents, I made every efforts to detect the transmission of this disease.</p>
<br \>
 
<p><b>I accepted the task without any hint of hesitation. The truth is that I do not have any clue at the moment. But I have to believe the answer lies in the data, perhaps a little digging will get me somewhere. People are dying, the bosses are staring, off to dig - I will keep on <strike>c</strike>trying.</b></p>
 
 
</div>
 
</div>
  
<table>
+
 
  <tr style="height:50px">
+
<div style="color:#0F1940; padding-left: 10px; font-size: 24px; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif;">Data Preparation</div>
    <td colspan="3">
 
      <div style="color:#0F1940; padding-left: 10px; font-size: 24px; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif;">Relevant Basic Facts</div>
 
 
<hr style="margin-left:10px;margin-right:10px">
 
<hr style="margin-left:10px;margin-right:10px">
    </td>
+
<div style="padding-left: 10px; color:#0F1940; font-size: 16px; font-weight: bold; font-family: 'Courier New', Courier, monospace">
  </tr>
+
<p>The initial dataset put the latitude and longitude data together, and the main information is contains in more than 1 million microblogs records. Hence, I separated the geographical digit to two columns, namely latitude and longitude. </p>
 
 
  <tr>
 
    <td width="30%" style="vertical-align:top"><div style="font-weight: bold; padding-left: 10px; color:#0F1940; font-size: 16px; font-family: 'Courier New', Courier, monospace">Population</div></td>
 
    <td style="vertical-align:top;font-weight: bold;">:</td>
 
    <td style="vertical-align:top">
 
      [[File:RaymPopulationChart.png|600px|center]]
 
      <div style="font-weight: bold;padding-left: 10px; color:#0F1940; font-size: 16px; font-family: 'Courier New', Courier, monospace">Total population - 2,202,381</div>
 
    </td>
 
  </tr>
 
 
 
  <tr>
 
    <td style="font-weight: bold;vertical-align:top"><div style="padding-left: 10px; color:#0F1940; font-size: 16px; font-family: 'Courier New', Courier, monospace">Symptoms</div></td>
 
    <td style="vertical-align:top;font-weight: bold;">:</td>
 
    <td style="vertical-align:top;font-weight: bold;padding-left: 10px; color:#0F1940; font-size: 16px; font-family: 'Courier New', Courier, monospace">
 
      <ul>
 
        <li>&nbsp;fever</li>
 
        <li>&nbsp;chills</li>
 
        <li>&nbsp;sweats</li>
 
        <li>&nbsp;aches and pains</li>
 
        <li>&nbsp;fatigue</li>
 
        <li>&nbsp;coughing</li>
 
        <li>&nbsp;breathing difficulty</li>
 
        <li>&nbsp;nausea and vomiting</li>
 
        <li>&nbsp;diarrhea</li>
 
        <li>&nbsp;enlarged lymph nodes</li>
 
      </ul>
 
    </td>
 
  </tr>
 
 
 
  <tr>
 
    <td style="font-weight: bold;vertical-align:top"><div style="padding-left: 10px; color:#0F1940; font-size: 16px; font-family: 'Courier New', Courier, monospace">Place of Interest</div></td>
 
    <td style="font-weight: bold;vertical-align:top">:</td>
 
    <td style="font-weight: bold;vertical-align:top;padding-left: 10px; color:#0F1940; font-size: 16px; font-family: 'Courier New', Courier, monospace">
 
[[File:RaymMapGreyscale.png|600px|center]]
 
      <ul>
 
        <li>&nbsp;vastopolis dome</li>
 
        <li>&nbsp;westside stadium</li>
 
        <li>&nbsp;vastopolis airport</li>
 
        <li>&nbsp;conventional center</li>
 
        <li>&nbsp;various zones' hospital</li>
 
        <li>&nbsp;government buildings</li>
 
        <li>&nbsp;vastopolis armed forces</li>
 
        <li>&nbsp;river and lakes (water supply)</li>
 
      </ul>
 
    </td>
 
  </tr>
 
 
 
<tr>
 
    <td style="font-weight: bold;vertical-align:top"><div style="padding-left: 10px; color:#0F1940; font-size: 16px; font-family: 'Courier New', Courier, monospace">Available data</div></td>
 
    <td style="font-weight: bold;vertical-align:top">:</td>
 
    <td style="font-weight: bold;vertical-align:top;padding-left: 10px; color:#0F1940; font-size: 16px; font-family: 'Courier New', Courier, monospace">
 
      <ol>
 
        <li>&nbsp;microblogs of the residents in smartpolis</li>
 
        <li>&nbsp;population and density breakdown of each city zone</li>
 
        <li>&nbsp;historical daily weather condition and wind direction</li>
 
        <li>&nbsp;map of smartpolis containing the city zone and point of interests</li>
 
      </ol>
 
    </td>
 
  </tr>
 
 
 
  <tr>
 
    <td style="font-weight: bold;vertical-align:top;padding-left: 10px; color:#0F1940; font-size: 16px; font-family: 'Courier New', Courier, monospace">Tools</td>
 
    <td style="font-weight: bold;vertical-align:top">:</td>
 
    <td style="font-weight: bold;vertical-align:top;padding-left: 10px; color:#0F1940; font-size: 16px; font-family: 'Courier New', Courier, monospace">
 
      <ul>
 
        <li>&nbsp;jmp pro</li>
 
        <li>&nbsp;tableau</li>
 
      </ul>
 
    </td>
 
  </tr>
 
</table>
 
 
<br \>
 
<br \>
 +
<p><b>Subsequently, I chose the key words to select the relevant information. Personally, I prefer a relatively small dataset with higher accuracy rather than a large dataset with lower accuracy. After many trials, I set the target words as:"fever", "chill", "fatigue", "cough", "difficult", "nausea", "vomit", "diarrhea", "lymph" and "throat".</b></p>
 
<br \>
 
<br \>
<div style="width:900px; background-color:#c0c0c0; height: 35px; border-bottom: 1px solid #9e9e9e; border-top: 1px solid #9e9e9e;">
+
Last but not lease, I attempted to explore more information. For instant, is there any initial symptoms before the patients becoming ill? In addition, after viewing the symptoms, we can initially group them into two main problems: flu (those with fever, chills, fatigue, coughing, breathing difficulty, sore throat and enlarged lymph nodes) and stomach problem (those with nausea, vomiting, diarrhea).
  <table style="width:900px;">
+
 
    <tr>
 
      <td width="40%" style="color:#ffffff;"></td>
 
      <td width="20%" style="color:#ffffff;text-align:center;font-size: 18px;">[[ISSS608_2017-18_T1_Assign_FOO_CELONG_RAYMOND|Home]]</td>
 
      <td width="40%" style="color:#ffffff;text-align:right;font-size: 18px;">[[ISSS608_2017-18_T1_Assign_FOO_CELONG_RAYMOND/MakingSenseOfTheChatter|Making Sense of the Chatters >]]</td>
 
    </tr>
 
  </table>
 
</div>
 
 
</div>
 
</div>

Revision as of 12:21, 15 October 2017

RaymHeader.png


By Zheng Mianyi


Background

An epidemic disease broke out in a major metropolitan area, Smartpolis. With provided information such as the city population, disease symptoms, both geographical map and weather of the city and most importantly: microblogs of the residents, I made every efforts to detect the transmission of this disease.


Data Preparation

The initial dataset put the latitude and longitude data together, and the main information is contains in more than 1 million microblogs records. Hence, I separated the geographical digit to two columns, namely latitude and longitude.


Subsequently, I chose the key words to select the relevant information. Personally, I prefer a relatively small dataset with higher accuracy rather than a large dataset with lower accuracy. After many trials, I set the target words as:"fever", "chill", "fatigue", "cough", "difficult", "nausea", "vomit", "diarrhea", "lymph" and "throat".


Last but not lease, I attempted to explore more information. For instant, is there any initial symptoms before the patients becoming ill? In addition, after viewing the symptoms, we can initially group them into two main problems: flu (those with fever, chills, fatigue, coughing, breathing difficulty, sore throat and enlarged lymph nodes) and stomach problem (those with nausea, vomiting, diarrhea).