Difference between revisions of "AY1516 T2 Team AP Data"

From Analytics Practicum
Jump to navigation Jump to search
Line 77: Line 77:
 
==<div style="background: #232AE8; line-height: 0.3em; font-family:helvetica;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#ffffff"><strong>Storing data</strong></font></div></div>==
 
==<div style="background: #232AE8; line-height: 0.3em; font-family:helvetica;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#ffffff"><strong>Storing data</strong></font></div></div>==
  
Our data comes from multiple sources (Twitter and Instagram), and one consideration is the ease of data retrieval after storing the SGAG network data extracted. As such, data storage in a relational database such as MySQL is preferred due to its support of various file format exports. Furthermore, data stored via this method can be easily manipulated and accessed for visualisations and further analysis through external software.   
+
Our data comes from multiple sources (Twitter and Facebook), and one consideration is the ease of data retrieval after storing the SGAG network data extracted. As such, data storage in a relational database such as MySQL is preferred due to its support of various file format exports. Furthermore, data stored via this method can be easily manipulated and accessed for visualisations and further analysis through external software.   
  
 
<!--------------- Body End ---------------------->
 
<!--------------- Body End ---------------------->

Revision as of 20:27, 17 April 2016

Team ap home white.png HOME

Team ap overview white.png OVERVIEW

Team ap analysis white.png ANALYSIS

Team ap project management white.png PROJECT MANAGEMENT

Team ap documentation white.png DOCUMENTATION

Project Description Data Methodology

Dataset provided by SGAG

Currently, SGAG only uses the insights provided on Facebook Page Insights and SocialBakers to gauge the reception of its posts, and much of the data that they have access to has not been analysed on a deeper level.

They have provided us with social media metric data extracted from its social media platforms, namely Facebook, Twitter and Youtube. This gives us the following datasets that present a generic aggregated representation SGAG's followers:

  • Unique visitors, by day and month
  • Post level insights: Total Impressions, Reach, Feedback
  • Engagement Insights: Likes, Viewed, Commented

This does not assist us directly in mapping out SGAG's social network, and we would have to crawl for more data using the API for each social media platform pertaining to the social network.

Crawling

Initially, we thought of mapping out the social networks for SGAG's main platforms: Facebook, Twitter and Instagram. However, due to the inaccessibility of user data that can be extracted from Facebook, we decided to focus on Twitter and Instagram first since we are able to extract social network data much more easily.

We will have to crawl the data through Twitter and Instagram API. Using NodeXL, we are able to extract SGAG's Twitter social network data.

This gives us the following information:

  • Followed/ Following relationship represented by Edges
  • Names of Twitter accounts associated with SGAG and their followers
  • Interactions with SGAG's posts (Favourites, Retweets and Replies)

Due to Twitter and Instagram's API's querying limit, we will have to spend some time requesting for data. We have arranged to do this within 1 week.

After successfully crawling the data, we will load it up into Gelphi and begin our visualisation.

Here is an example of an expected network visualisation for a social media platform.

Expectedvis.png

Merging data

The Tweet ID provided by SGAG per tweet will be mapped with the crawled data above, and used to plot networks that link each tweet with retweets, replies, likes, etc.

NodeXL provides easy importing of Twitter network data. The imported data will then be prepared and cleaned in the following ways through the merging of duplicate edges to reduce data noise, and grouping of nodes via a cluster algorithm. Metrics and graphs of the network will also be generated.

Storing data

Our data comes from multiple sources (Twitter and Facebook), and one consideration is the ease of data retrieval after storing the SGAG network data extracted. As such, data storage in a relational database such as MySQL is preferred due to its support of various file format exports. Furthermore, data stored via this method can be easily manipulated and accessed for visualisations and further analysis through external software.