Twitter Analytics: Documentation

From Analytics Practicum
Revision as of 22:08, 8 September 2014 by Ffortunata.2011 (talk | contribs)
Jump to navigation Jump to search


Home   Project Overview   Project Management   Documentation   Findings   About Me

Data

Data is collected manually from Twitter with Python and stored in SQLite database. Several keywords have been tried and retrieved such as “#ippt”, “#gaza” and “#MH17”. However, the data collected is deemed to be unrepresentative as it is seasonal (“#ippt” and “MH17”) which spikes high during a short period of time. On the other hand,“#gaza” keyword retrieves a lot of tweets within a short period of time which makes a better data. However, we may need to gather more data in terms of time frame and its granularity to find the suitable forecasting.

Based on the processing speed limitation of R, this project will only look into 10,000 rows of data for efficiency. However, more data can be analyzed if time is not a constraint to the project. From the data gathered, various attributes are collected. However, the below will be the focus of this project:

  • User name
  • Post date
  • Location
  • Tweet content

Data Cleansing Methodology

Upon data exploration, the following methodologies for data cleansing is proposed:

  1. Extract tweets from SQLite
  2. Convert tweets to a data frame
  3. Convert tweets to corpus
  4. Change tweets to lower case
  5. Remove punctuations
  6. Remove stopwords
  7. Stemming words to retrieve their radicals
  8. Remove links from tweets
  9. Remove Retweet and Mentions

Data Exploration Findings

Several findings from data exploration to find pattern:

  1. “@..” can be used to identify the relationship between users and classified as one user “mentions” another user in the post
  2. “RT” can be used to indicate retweet of content by another user to indicate influencers
  3. Location is not recommended, as a selection features as users’ preference polarity exist. Some tends to disable their location tracking in their device while others may not. Hence, by separating the groups, there is a likelihood that only the same group of users are analyzed



Project Approach

The analytics project delivery and development utilizes the agile and iterative implementation approach. Hence, frequent communication with clients and teaching staff to gather inputs for model development and refinement will be emphasized in various stages

Fap6.png

Forecasting Approach


Forecasting Approach