Difference between revisions of "Twitter Analytics: Documentation"

From Analytics Practicum
Jump to navigation Jump to search
Line 23: Line 23:
 
|
 
|
 
|}
 
|}
 +
</div>
 +
 +
 +
==<div style="background: #1D393D; padding: 13px; font-weight: bold; text-align:center; line-height: 0.3em; text-indent: 20px;font-size:26px; font-family:Britannic Bold"><font color= #ffffff>Data</font></div>==
 +
<div style="margin:20px; padding: 10px; background: #ffffff; font-family: Trebuchet MS, sans-serif; font-size: 95%;-webkit-border-radius: 15px;-webkit-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96); -moz-box-shadow:    7px 4px 14px rgba(176, 155, 121, 0.96);box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);">
 +
<font size =3 face=Georgia>
 +
 +
<p>Data is collected manually from Twitter with Python and stored in SQLite database. Several keywords have been tried and retrieved such as “#ippt”, “#gaza” and “#MH17”. However, the data collected is deemed to be unrepresentative as it is seasonal (“#ippt” and “MH17”) which spikes high during a short period of time. On the other hand,“#gaza” keyword retrieves a lot of tweets within a short period of time which makes a better data. However, we may need to gather more data in terms of time frame and its granularity to find the suitable forecasting.</p>
 +
<p>Based on the processing speed limitation of R, this project will only look into 10,000 rows of data for efficiency. However, more data can be analyzed if time is not a constraint to the project.
 +
From the data gathered, various attributes are collected. However, the below will be the focus of this project:</p>
 +
* User name
 +
* Post date
 +
* Location
 +
* Tweet content
 +
 
</div>
 
</div>

Revision as of 21:36, 8 September 2014

Home   Project Overview   Project Management   Documentation   Findings   About Me


Data

Data is collected manually from Twitter with Python and stored in SQLite database. Several keywords have been tried and retrieved such as “#ippt”, “#gaza” and “#MH17”. However, the data collected is deemed to be unrepresentative as it is seasonal (“#ippt” and “MH17”) which spikes high during a short period of time. On the other hand,“#gaza” keyword retrieves a lot of tweets within a short period of time which makes a better data. However, we may need to gather more data in terms of time frame and its granularity to find the suitable forecasting.

Based on the processing speed limitation of R, this project will only look into 10,000 rows of data for efficiency. However, more data can be analyzed if time is not a constraint to the project. From the data gathered, various attributes are collected. However, the below will be the focus of this project:

  • User name
  • Post date
  • Location
  • Tweet content