Difference between revisions of "AY1516 T2 Team AP Data"

From Analytics Practicum
Jump to navigation Jump to search
Line 73: Line 73:
  
 
Our data needs to be saved in a convenient format so that we can use it as input for other analytic programs.
 
Our data needs to be saved in a convenient format so that we can use it as input for other analytic programs.
<p>An option for fast querying is storing the data in a database. This approach provides easy export to other formats that can work with analytic software, and access from both a GUI and code.
+
 
Another option is to store data in flat files for easy transport between systems. However, it will reduce accessibility since our code and program need to parse the information again.</p>
+
An option for fast querying is storing the data in a database. This approach provides easy export to other formats that can work with analytic software, and access from both a GUI and code.
<p>With pros and cons in mind, we will proceed with the database approach initially, and make changes as the the project continues.</p>
+
 
 +
Another option is to store data in flat files for easy transport between systems. However, it will reduce accessibility since our code and program need to parse the information again.
 +
 
 +
With pros and cons in mind, we will proceed with the database approach initially, and make changes as the the project continues.
  
 
<!--------------- Body End ---------------------->
 
<!--------------- Body End ---------------------->

Revision as of 02:57, 17 January 2016

HOME

OVERVIEW

ANALYSIS

PROJECT MANAGEMENT

DOCUMENTATION

Project Description Data Methodology

Dataset provided by SGAG

Currently, SGAG only uses the insights provided on Facebook Page Insights and SocialBakers to gauge the reception of its posts, and much of the data that they have access to has not been analysed on a deeper level.

They have provided us with social media metric data extracted from its social media platforms, namely Facebook, Twitter and Youtube. This gives us the following datasets that present a generic aggregated representation SGAG's followers:

  • Unique visitors, by day and month
  • Post level insights: Total Impressions, Reach, Feedback
  • Engagement Insights: Likes, Viewed, Commented

This does not assist us directly in mapping out SGAG's social network, and we would have to crawl for more data using the API for each social media platform pertaining to the social network.

Crawling

Initially, we thought of mapping out the social networks for SGAG's main platforms: Facebook, Twitter and Instagram. However, due to the inaccessibility of user data that can be extracted from Facebook, we decided to focus on Twitter and Instagram first since we are able to extract social network data much more easily.

We will have to crawl the data through Twitter and Instagram API. Using NodeXL, we are able to extract SGAG's Twitter social network data.

This gives us the following information:

  • Followed/ Following relationship represented by Edges
  • Names of Twitter accounts associated with SGAG and their followers
  • Interactions with SGAG's posts (Favourites, Retweets and Replies)

Due to Twitter and Instagram's API's querying limit, we will have to spend some time requesting for data. We have arranged to do this within 1 week.

After successfully crawling the data, we will load it up into Gelphi and begin our visualisation.

Here is an example of an expected network visualisation for a social media platform.

Expectedvis.png

Merging data

The Tweet ID provided by SGAG per tweet will be mapped with the crawled data above, and used to plot networks that link each tweet with retweets, replies, likes, etc.

NodeXL provides easy importing of Twitter network data. The imported data will then be prepared and cleaned in the following ways through the merging of duplicate edges to reduce data noise, and grouping of nodes via a cluster algorithm. Metrics and graphs of the network will also be generated.

Storing data

Our data needs to be saved in a convenient format so that we can use it as input for other analytic programs.

An option for fast querying is storing the data in a database. This approach provides easy export to other formats that can work with analytic software, and access from both a GUI and code.

Another option is to store data in flat files for easy transport between systems. However, it will reduce accessibility since our code and program need to parse the information again.

With pros and cons in mind, we will proceed with the database approach initially, and make changes as the the project continues.