AY1516 T2 Team AP Data

From Analytics Practicum
Jump to navigation Jump to search

Team ap home white.png HOME

Team ap overview white.png OVERVIEW

Team ap analysis white.png ANALYSIS

Team ap project management white.png PROJECT MANAGEMENT

Team ap documentation white.png DOCUMENTATION

Project Description Data Methodology

Dataset provided by SGAG

Currently, SGAG only uses the insights provided on Facebook Page Insights and SocialBakers to gauge the reception of its posts, and much of the data that they have access to has not been analysed on a deeper level.

They have provided us with social media metric data extracted from its social media platforms, namely Facebook, Twitter and Youtube. This gives us the following datasets that present a generic aggregated representation SGAG's followers:

  • Unique visitors, by day and month
  • Post level insights: Total Impressions, Reach, Feedback
  • Engagement Insights: Likes, Viewed, Commented

This does not assist us directly in mapping out SGAG's social network, and we would have to crawl for more data using the API for each social media platform pertaining to the social network.

Crawling

Initial exploration with NodeXL Initially, we decided to map out the social networks for SGAG's main platforms: Facebook, Twitter, doing that in the same order. In terms of tools considered for the job, we explored firstly NodeXL, because of the ease of data retrieval. However, due to restrictions imposed on the usage of this tool for the free version, we decided to explore various other options. Our initial exploratory plans with NodeXL are documented below.

We will have to crawl the data through Twitter API. Using NodeXL, we are able to extract SGAG's Twitter social network data.

This gives us the following information:

  • Followed/ Following relationship represented by Edges
  • Names of Twitter accounts associated with SGAG and their followers
  • Interactions with SGAG's posts (Favourites, Retweets and Replies)

Due to Twitter API's querying limit, we will have to spend some time requesting for data. We have arranged to do this within 1 week.

After successfully crawling the data, we will load it up into Gelphi and begin our visualisation.

Here is an example of an expected network visualisation for a social media platform.

Expectedvis.png


Settling on Facebook and Twitter API using Python

Merging data

The Tweet ID provided by SGAG per tweet will be mapped with the crawled data above, and used to plot networks that link each tweet with retweets, replies, likes, etc.

NodeXL provides easy importing of Twitter network data. The imported data will then be prepared and cleaned in the following ways through the merging of duplicate edges to reduce data noise, and grouping of nodes via a cluster algorithm. Metrics and graphs of the network will also be generated.

Storing data

Our data comes from multiple sources (Twitter and Facebook), and one consideration is the ease of data retrieval after storing the SGAG network data extracted. As such, data storage in a relational database such as MySQL is preferred due to its support of various file format exports. Furthermore, data stored via this method can be easily manipulated and accessed for visualisations and further analysis through external software.