Difference between revisions of "AY1516 T2 Team AP Analysis"

From Analytics Practicum
Jump to navigation Jump to search
Line 42: Line 42:
 
As explained in the previous section, our initial dataset was insufficient for our data analysis. Therefore, we decided to retrieve more substantial data directly via Twitter, through their publicly exposed APIs. We researched on various Python libraries suitable for data crawling, exploring wrapper libraries such as Tweepy and python-twitter. We finalised on the use of the Tweepy library, given the huge community support and ease of use.
 
As explained in the previous section, our initial dataset was insufficient for our data analysis. Therefore, we decided to retrieve more substantial data directly via Twitter, through their publicly exposed APIs. We researched on various Python libraries suitable for data crawling, exploring wrapper libraries such as Tweepy and python-twitter. We finalised on the use of the Tweepy library, given the huge community support and ease of use.
 
</p>
 
</p>
 +
 +
<b> Methodology </b><br/>
 +
Our exploratory analysis aims to analyse the behaviour of Twitter users at an individual post level- to find out the type of posts that tend to be retweeted, as well as profile the kind of Twitter users that are more prone to retweet them. As such, to collect relevant data, we utilised the <b>user_timeline()</b> and <b>retweeters()</b> methods. The <b>user_timeline()</b> method is used to collect all SGAG posts. For each SGAG post, the <b>retweeters()</b> is called to retrieve the list of retweeters of that particular post.

Revision as of 17:45, 25 February 2016

HOME

OVERVIEW

ANALYSIS

PROJECT MANAGEMENT

DOCUMENTATION

Data Retrieval Data Manipulation Findings

Initial Dataset provided by SGAG

During our initial meeting with SGAG, they provided several data files with information regarding their social media accounts. Upon further inspection, we realised that the data provided were largely aggregate data, and even if we attempted to load it into data analysis tools like Gephi/Graphwiz to analyse SGAG's social network, it would not be a correct representation. In addition, the Tweet Activity Metrics could not show how popular each post was to specific users, rendering the data fairly unusable.

Hence, we decided to retrieve the data ourselves from Twitter, in attempt to visualise SGAG's social network that included specific users, instead of aggregated data. We attempted this by leveraging on the Twitter public API, to tailor to our data collection needs.

Twitter API Exploration

As explained in the previous section, our initial dataset was insufficient for our data analysis. Therefore, we decided to retrieve more substantial data directly via Twitter, through their publicly exposed APIs. We researched on various Python libraries suitable for data crawling, exploring wrapper libraries such as Tweepy and python-twitter. We finalised on the use of the Tweepy library, given the huge community support and ease of use.

Methodology
Our exploratory analysis aims to analyse the behaviour of Twitter users at an individual post level- to find out the type of posts that tend to be retweeted, as well as profile the kind of Twitter users that are more prone to retweet them. As such, to collect relevant data, we utilised the user_timeline() and retweeters() methods. The user_timeline() method is used to collect all SGAG posts. For each SGAG post, the retweeters() is called to retrieve the list of retweeters of that particular post.