ANLY482 AY2016-17 T1 Group1: HOME/Interim

From Analytics Practicum
Revision as of 22:42, 16 October 2016 by Xiuming.hoe.2013 (talk | contribs) (Created page with "<!--Header--> {|style="background-color:#4db8ff; color:#000000; padding: 10 0 10 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" | | style="padding:0...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

HOME

 

ABOUT US

 

PROJECT OVERVIEW

 

PROJECT FINDINGS

 

PROJECT MANAGEMENT

 

DOCUMENTATION

INTERIM PROGRESS

Overview

SGAG is one of Singapore’s leading local humour content creators with the motto of “to make readers laugh at least 5 times a day, 365 days a year”. To achieve its motto, SGAG focuses their attention on creating engaging and interesting content in their daily posts. SGAG creates two types of content on their different social media platforms. These include paid advertisements and organic posts.

Over the years, many other local players such as SMRT Feedback and TheSmartLocal have joined SGAG to generate humour contents on social media. As such, SGAG would need to constantly improve their content generation strategy to maintain their competitive advantage. Through this project, SGAG would like to find out the factors affecting the performance of its Facebook posts, the characteristics of a great Facebook post, as well as the performance of its branded Facebook posts and Facebook video posts.

In order to perform the analysis, we have gathered one-year worth of data from August 2015 to August 2016. The mentioned data includes data extracted from SGAG’s Facebook insights and additional advertised posts’ data collected from SGAG. Our exploratory data analysis has been constructive to SGAG thus far and going forward, we will attempt to perform further analysis through classification models such as cluster analysis and latent model analysis.

Data Integration and Filtering

Extracted Table

Some columns in the original dataset were extracted to a new table as the original form does not serve to perform comparison analysis.

Group 1 Table1 1.png

The figure above is a snippet of how lifetime likes by gender and age were stored in the original Page Level dataset. Lifetime Likes by Gender and Age stores an aggregated demographic data about the unique Facebook users who like SGAG's Page based on the age and gender information they provide in their user profiles. The original format does not allow a comparison of changes in daily likes of a gender by different age groups and also gained in daily likes.

Hence, we extracted the data into the following table and calculated the differences in daily likes in order to achieve our objective:

ANLY482 Group1 Table1 2.png

Challenges

The first challenge that team encountered was to perform a manual reconciliation of advertiser’s data to the post’s data (advertisement) as the information of the advertisers and the relevant posts to the advertiser were available via a dropbox link stored in another Microsoft Excel spreadsheet.

The second challenge is that a manual identification of video post’s data based on the source of the video. A video posted on SGAG’s page could be in-house generated (by SGAG) or Shared Video (by other Facebook Users/Public). The identification of the source depends on keywords and the characters appeared in the video. For instance, a video post is considered a shared video if the post message of the video contains words such as “credit to” or “submitted by” <name>. On the other hand, characteristics of an in-house generated video is when any of the SGAG characters appeared in the video (e.g. Xiao Ming, Sue-Ann). When a video does not possess any of the characteristics mentioned above, our team would have to confirm the source of the video with our sponsor, Mr. Karl.

Choice of Key Measurements

In our analysis, measurements such as Reach, Engagement, Impressions, Likes, Unlikes, Comments, Shares, Negative Feedbacks, and various length of Video Views have been chosen to be the key performance indicators. Measurements such as Lifetime Post Paid Reach and Lifetime Post Total Reach will not be used as performance measurements as there is no paid posts in SGAG’s dataset. Hence, the amount of organic reach would be the same as total reach and paid post reach will always be 0. As such measurements would be redundant and meaningless in our analysis, we have then excluded it from our analytical dataset.

Data Cleaning and Exploration

Issues

Several problems such as duplication of data, missing values and outliers can be found in the dataset collected from SGAG. As these issues will potentially affect the result of our analysis, suitable actions will be taken to handle such issues prior to performing our analysis.

1. Missing Values :
After examining the dataset, a few missing values can be found at page level dataset, and no missing values are found at the post and video level dataset. As the missing columns in the page level dataset contains measures such as lifetime likes and daily demographics data that is critical in the evaluation of SGAG’s overall daily performance, these dates (26 January 2016, 28 & 29 August 2016) will be removed from our subsequent page level analysis.


2. Duplicate Values :
There is no duplicate found at the page level data. However, a handful of row duplications and post message duplications can be found at both post and video level dataset. Some of the common issues found are as following:
i. Same Post Message with Different Content
ANLY482 Group1 Figure2 1.png
Figure 2.1 shows two posts that are described by exactly the same post message. After looking into the posts, we realized that both of the posts are of different content and therefore, will retain such posts in our further analysis.
ii. Identical Rows
ANLY482 Group1 Figure2 2.png
As seen in Figure 2.2, various columns such as Post ID, Permalink and Post Message are the same across the two rows. Hence, we will remove one of the rows in our dataset for such situations.
iii. Cover Photo and Timeline Photo Update
Besides some of the common duplication issues mentioned above, we have also discovered that there are updates such as cover photo and profile picture update that result in the duplication of post messages. As this posts are only update of SGAG’s Facebook profile, they are not an indicator of SGAG’s performance and thus, will be removed from our further study.


3. Outlier :
Outlier in this project is defined as posts or dates that have significantly better or worse performance as compared to average SGAG Facebook posts’ performance. These posts go viral and perform exceptionally well due to certain special events such as the launch of Pokémon game in Singapore. As much as these posts generated high reach and engagement for SGAG, they are dominant and will potentially influence the results of our findings. Consequently, these posts will be excluded from our study. Figure 2.3, 2.4 and 2.5 below show the examples of outliers if page, post and video level data set respectively.
ANLY482 Group1 Figure2 3.png
ANLY482 Group1 Figure2 4.png
ANLY482 Group1 Figure2 5.png

Exploration

Our team started off by looking at the changes in SGAG’s audience base from August 2015 to August 2016. Lifetime Total Likes is used to assess the growth or decline in their audience.

Maximum Monthly Total Likes
ANLY482 Group1 Figure3 1.png
As seen in Figure 3.1, there is a consistent growth in SGAG Facebook Page fan from August 2015 to August 2016, with a rapid increase of over 50 thousand fan likes in August 2016.


Changes in Daily New Likes and Unlikes
ANLY482 Group1 Figure3 2.png
The different dates with spike in SGAG Facebook page likes were on 4 February, 31 May, 21 & 22 June, 1, 20, 25 and 26 August 2016.
ANLY482 Group1 Figure3 3.png
In Figure 3.3, we can see an overlapping of dates between spikes in likes and unlikes. Although the trend change between likes and unlikes is similar, the number of unlikes make up to a small amount of likes gained. 2,319 likes were gained with 106 unlikes on SGAG Facebook Page on 4th February 2016. The changes in unlikes amounted to only 4-5% of the number of likes gained on those overlapping dates. We will further look into the different posts on those dates to know the different types of post that attracts the most audience or turns the audience away.
A “Like” from a new fan indicates their interest in receiving SGAG’s posts in their newsfeed. As there are different target groups, we will examine SGAG Facebook Page performance in reaching out to the fan. To achieve the objective, we will look into the demographics of SGAG’s fan to know which gender and age group form a larger audience base.


Changes in Monthly Likes by Gender and Age Group
ANLY482 Group1 Figure3 4.png
ANLY482 Group1 Figure3 5.png
As seen in Figure 3.4 and Figure 3.5, age group 18-24 for both male and female continue to be the largest audience of SGAG’s, followed by age group 25-34. These 2 groups are particularly more reactive towards SGAG’s posts whereas there is only a slight improvement or unchanged in other age groups’ interest. Teens aged 13-17 are more active on other social media such as Instagram and Snapchat. Middle-aged adults and elderly are less active on social media. Findings were highlighted to SGAG and SGAG commented that their posts are targeting more on these 2 age groups.