Difference between revisions of "ANLY482 Team wiki: 2015T2 TeamROLL Project Overview"
Line 15: | Line 15: | ||
[[Image:teamroll_this.png|40px|link=ANLY482 Team wiki: 2015T2 TeamROLL Project Overview]] | [[Image:teamroll_this.png|40px|link=ANLY482 Team wiki: 2015T2 TeamROLL Project Overview]] | ||
[[ANLY482 Team wiki: 2015T2 TeamROLL Project Overview |<font color="#0091EA" size=3><b>PROJECT OVERVIEW</b></font>]] | [[ANLY482 Team wiki: 2015T2 TeamROLL Project Overview |<font color="#0091EA" size=3><b>PROJECT OVERVIEW</b></font>]] | ||
+ | |||
+ | | style="border-bottom:0px solid #3D9DD7; background:none;" width="1%" | | ||
+ | | style="padding:0.3em; font-size:100%; background-color:#F5F5F5; border-bottom:0px solid #3D9DD7; text-align:center; color:#000" width="12%" | | ||
+ | [[Image:teamroll_analysis.png|30px|link=ANLY482 Team wiki: 2015T2 TeamROLL Data Analysis]] | ||
+ | [[ANLY482 Team wiki: 2015T2 TeamROLL Project Overview |<font color="#000000" size=2><b>DATA ANALYSIS</b></font>]] | ||
| style="border-bottom:0px solid #3D9DD7; background:none;" width="1%" | | | style="border-bottom:0px solid #3D9DD7; background:none;" width="1%" | |
Revision as of 18:19, 28 February 2016
Proposal | Midterm | Final |
Contents
Overview
However with limited resources, SGAG could not conduct a comprehensive analysis and harness on the big data available to them. This project aims to uncover valuable insights on SGAG’s content attributes in order to achieve audience growth. Using data gathered from SGAG’s facebook page for the year 2015, the team hopes to firstly, conduct exploratory data analysis so as to identify overall performance trends. Next, the team will be performing cluster analysis followed by sentiment analysis, topic analysis and content analysis. Lastly, the team will be building a regression model, which includes findings derived from the analysis conducted, in order to predict better performing future posts. With the insights gained, the team will be providing recommendations to enable data driven content creation, thus allowing SGAG to achieve their aim of greater growth.
About SGAG
Project Motivation
- What are the characteristics of a “great” post? SGAG has so far thrived on an intuitive understanding of their customer's content preferences. However, SGAG does not have a concrete or clear picture of the kinds of attributes which they can work on to make a specific post a "great" one.
- What is audience sentiment on "viral" posts? Are they reacting in a positive or negative manner? SGAG is concerned that "viral" posts become popular because they receive a lot of "hate", which goes against their content philosophy which is to make people "laugh", a positive emotion. Currently, they do not have easy visibility on this aspect.
SGAG hopes this project will be able to utilise a rich pool of historical data to derive insights into the concerns posed above, so that SGAG would be better able to formulate a more relevant content creation strategy.
Project Objective
The final goal of this project is to offer useful insights for SGAG to formulate a better content creation strategy moving forward. To measure the effectiveness of their content strategy, and at a more granular level, the effectiveness of each individual post, SGAG operationalises effectiveness as "growth" which is defined by an increase in 1) Number of fans, 2) Audience reach, and 3) Engagement with audience members. This last indicator is further measured by the number of times audience members perform actions such as “likes”, “comments”, “shares”, “retweets” or clicking on links to find out more about the content SGAG has to offer. To do so, we attempt to answer the two main challenges posed by SGAG in a concrete, data-driven manner by performing an in-depth analysis on SGAG's historical data. More specifically, we attempt to address the following analysis requirements:
- To be able to understand whether a post is popular in a “positive” or “negative” manner
- To assess the role of content layout and design in improving popularity of posts.
- To develop a list of common topics and be able to understand the role of topic-selection in affecting the popularity of posts
Data collection and description
Our two main datasets are: Facebook Insights Data Export - SGAG - Page Level, and Facebook Insights Data Export - SGAG - Post Level. The datasets are sponsored by SGAG and extracted from the Facebook Insights tool. A year's worth of data from 2015 was extracted. Although SGAG also obtained similar data for the same time period from Twitter through Twitter Analytics, this would not be the focus of our project for the present time.
Facebook Insights Data Export - SGAG - Page Level
This dataset captures key performance indicators of SGAG at the page level. These include variables such as lifetime total likes, new likes, unlikes, number of engaged users, reach, organic reach, number of clicks on content, and number of negative feedback, on the daily level, or aggregated to form weekly and 28 days measures. This dataset also captures information regarding the demographics of SGAG's customers, their ages and gender, as well as their location in terms of countries and cities.
Facebook Insights Data Export - SGAG - Post Level
This dataset similarly captures key metrics of SGAG, but at the post level. Many variables found in the earlier dataset are also reflected in this dataset, but at the post level. We propose that this dataset be our main point of analysis for this project, with the earlier dataset utilised as a supporting analysis.
Work Scope
Our proposed work scope will focus on the main content distribution channel SGAG currently uses, which is Facebook. This would be where SGAG garners the most reach and engagement from their target audience. We will also be conducting our analysis based on historical Facebook data for the year 2015, which is suitable due to it being relatively recent. A step-by-step breakdown of our proposed scope of analysis is as follows:
- Data Collection – Collect Facebook data for the year 2015 to be analysed, from SGAG
- Data Preparation – Clean and transform data into a readable CSV for upload
- Exploratory Data Analysis - Identify overall performance trends
- Cluster Analysis – Perform segmentation of Facebook posts based on their performance in terms of total reach and engagement level (likes, shares, comments)
- Sentiment Analysis – Identify differing sentiments based on posts and clusters
- Topic Analysis - Generate and identify topics based on posts and clusters
- Content Analysis - Identify key design attributes based on posts and clusters
- Regression Modelling – Build a regression model that includes success factors derived from analysis, to aid in predicting better performing future posts
Proposed Methodology
Data Collection
Download performance metrics, for the year Jan - Dec 2015, at both Page and Post level, of SGAG's Facebook page from Facebook insights. Conduct data crawling to retrieve comments responses for the various content posted within the same time period.
Data Preparation
Combine the monthly performance datasets and "response" dataset crawled from the SGAG Facebook page, then select the relevant variables for analysis. The final working dataset will then be transformed into a readable CSV for upload.
Exploratory Data Analysis
Conduct overall performance analysis on the dataset to identify general trends. For instance, some trends to discover could include seasonality trends in customer engagement across the year, differing engagement levels across different age groups, average number of likes, shares and comments across all posts, and how such indicators are distributed across all posts.
Cluster Analysis
Based on the performance metrics of reach and engagement level (likes, shares, comments) we will conduct cluster/segmentation analysis on the dataset to identify different clusters of posts with different effects on the performance metrics. For instance, top performing posts, debatable posts, etc. We propose using software tools such as SAS Enterprise Guide to aid in the analysis.
Sentiment Analysis
Based on the performance clusters derived above, we use Sentiment Analysis to dig deeper and uncover how customers' sentiment can affect the performance ratings of different posts. We will conduct text mining and sentiment topic analysis to discover "happiness" or "hate" levels on different types of posts, taking reference from previous studies on sentiment analysis on social media. We propose using the Text Mining module on SAS Enterprise Miner to aid in the analysis.
Topic Analysis
Based on the performance clusters derived above, we use Topic Analysis to uncover popular themes and topics that customers' are interested in, and their impact on performance ratings. Some example of themes/topics include, National Service Stories, Government Policies Stories, Funny Viral Stories, Working Life Stories, etc. SGAG is also interested in understanding how different topics appeal to different age groups, and if there are any overarching topics that appeal greatly across all age groups. We will use a mixed method comprising of text and topic mining on "responses" to generate possible topics, supplemented by sampling and manual theme coding to discover any potential lesser-known topic. We propose using the Text Mining module on SAS Enterprise Miner to aid in the analysis.
Content Analysis
Based on the performance cluster derived above, we use Content Analysis to uncover the effects of key design attributes in affecting performance ratings. Some key design attributes include: 1) the number of picture frames used, 2) the kinds of characters used (for instance, common SGAG characters, foreign celebrities, local celebrities, political figures, etc.), and 3) the number of words used. We propose using sampling techniques to identify a representative sample to perform content analysis on, since most of SGAG's content is pictorial and likely to require manual observation and recording of design attributes to be analysed.
Regression Modelling
Lastly, based on all the performance insights derived from the various analysis above, we propose to use a multi-linear-regression model to assess the overall effect of the success factors derived above, on performance. This model would enable SGAG and the team to understand if all of these factors are sufficient in answering the question "what makes a great post?", or if further studies are required to uncover more factors to improving performance. The model could also serve as a useful scoring tool to gauge future content generated by the creative team, if they should meet SGAG's target performance levels. We propose using SPSS or SAS Enterprise Guide to aid in this analysis.