Difference between revisions of "ISSS608 2017-18 T1 Assign FOO CELONG RAYMOND/MakingSenseOfTheChatter"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 44: Line 44:
 
   <table style="width:700px; border-collapse: collapse;">
 
   <table style="width:700px; border-collapse: collapse;">
 
     <tr>
 
     <tr>
       <th width="40%" style="border: 1px solid black;background: #dce0f7;">Symptoms</th>
+
       <th width="40%" style="border: 1px solid black;background: #dce0f7;">Classified as</th>
       <th width="70%" style="border: 1px solid black;background: #dce0f7;">Searched Keyword</th>
+
       <th width="70%" style="border: 1px solid black;background: #dce0f7;">Searched Terms</th>
 
     </tr>
 
     </tr>
 
     <tr>
 
     <tr>
Line 98: Line 98:
 
       <td style="border: 1px solid black;">enlarged lymph node</td>
 
       <td style="border: 1px solid black;">enlarged lymph node</td>
 
       <td style="border: 1px solid black;">lymph</td>
 
       <td style="border: 1px solid black;">lymph</td>
 +
    </tr>
 +
  </table>
 +
</div>
 +
<br \>
 +
<p>After exploring the visualisations, the following keywords are tracks as well.</p>
 +
<br \>
 +
<div style="position:relative; margin:auto; width:700px;">
 +
  <table style="width:700px; border-collapse: collapse;">
 +
    <tr>
 +
      <th width="40%" style="border: 1px solid black;background: #dce0f7;">Classified as</th>
 +
      <th width="70%" style="border: 1px solid black;background: #dce0f7;">Searched Terms</th>
 +
    </tr>
 +
    <tr>
 +
      <td style="border: 1px solid black;">Accident</td>
 +
      <td style="border: 1px solid black;">acci</td>
 +
    </tr>
 +
    <tr>
 +
      <td style="border: 1px solid black;">Convention</td>
 +
      <td style="border: 1px solid black;">conven</td>
 +
    </tr>
 +
    <tr>
 +
      <td style="border: 1px solid black;">Explosion</td>
 +
      <td style="border: 1px solid black;">explo</td>
 +
    </tr>
 +
    <tr>
 +
      <td style="border: 1px solid black;">Truck</td>
 +
      <td style="border: 1px solid black;">truck</td>
 
     </tr>
 
     </tr>
 
   </table>
 
   </table>

Revision as of 00:10, 14 October 2017

RaymHeader.png



Exploring and Organising the Data

First thing first, I checked if there is any dirty data. True enough, there are 21 microblogs with problems with the time values in the date. I cleaned up the time and set them to midnight (00:00).


RaymDirtyDates.png


The number of microblog is massive. I looked through to see the distribution of posts over time.


RaymMicroblogPerDay.png


RaymMicroblogPerHour.png


The data will need to be organised in some manner so that they can be easily analysed later. The obvious choice is by city zones. I carefully started to group the microblog by the zones by creating a column to store the zone in which the microblog was transmitted.


RaymMicroblogByZone.png


Next, I will also create an indicator column for microblogs that were transmitted area near the various points of interest.


RaymMicroblogByPlaceOfInterest.png


That should organised the geolocation information. Now that that is done, the next thing is to organise the texts by their content. I classify the microblogs by their similarities to each other, using the latent class analysis that is parameterised to generate 10 clusters.

  •  Clusters 1 to 5 are common topics.
  •  Class 6 & 7 contains messages about people suffering from the illness but there are many post where it is from a third person point of view, or it is not clear if the author is talking about himself. They might be useful to track the spread but less so to trace the source by tracking the whereabout of the author.
  •  Class 8 are microblogs in one or more foreign languages.
  •  Class 9 are messages about conventions and election debates. These gatherings may be hotspots for the transmission of the disease.
  •  Class 10 is interesting. It contains conversations that uses the phrase "lose my mind" in addition of their symptoms. Most of the authors describes the symptoms from first person which makes this group suitable for tracing of the source.


RaymFormulaKeyword.png


Indicators if symptoms keywords are used in the microblogs' text are added. But because these are ordinary citizens and not doctors they might use non medical terms to describe their condition. A survey of the microblogs yields the following keywords and their classification.


Classified as Searched Terms
flu flu, runny nose
fever fever, temp, temperature
chills chill
sweats sweat
aches ache, aching, sore, cramp
pains pain, hurt
fatigue fatigue
coughing cough, pnenomia
breathing difficulty breath
nausea naus
vomiting vomit, throwing up, puk
diarrhea diarrhea
enlarged lymph node lymph


After exploring the visualisations, the following keywords are tracks as well.


Classified as Searched Terms
Accident acci
Convention conven
Explosion explo
Truck truck