Difference between revisions of "Group04 Project Findings"

From Analytics Practicum
Jump to navigation Jump to search
Line 109: Line 109:
 
* Generally, commentators that reply the most often to the comments that were made in the SGAG's Facebook post are quite dispersed. However, there are a few notable commentators that have replied more as compared to the rest and this would be crucial for SGAG to determine the network strength.   
 
* Generally, commentators that reply the most often to the comments that were made in the SGAG's Facebook post are quite dispersed. However, there are a few notable commentators that have replied more as compared to the rest and this would be crucial for SGAG to determine the network strength.   
  
===Youtube SGAG & Competitors===
+
===YouTube Comments===
 
* SGAG's video performance in terms of the number of replies is comparable to TheSmartLocals and Night Owl Cinematics.
 
* SGAG's video performance in terms of the number of replies is comparable to TheSmartLocals and Night Owl Cinematics.
 
* However, SGAG falls short in terms of the number of likes for the comment received. TheSmartLocal has the highest number of likes, followed by Night Owl Cinematics.  
 
* However, SGAG falls short in terms of the number of likes for the comment received. TheSmartLocal has the highest number of likes, followed by Night Owl Cinematics.  

Revision as of 10:15, 10 January 2018

GROUP4  
04HOMEPAGE.png HOMEPAGE   04OVERVIEW.png PROJECT OVERVIEW   04FINDINGS.png PROJECT FINDINGS   04PM.png PROJECT MANAGEMENT   04DOCUMENTATION.png DOCUMENTATION   04MAIN.png ANALY482 MAIN  
PROPOSAL INTERIM FINAL



Data

Data Scraping

To fulfil the objectives mentioned, we scraped data from platforms whereby SGAG have a strong presence on, namely Facebook and YouTube.

Facebook Posts

We used Facebook's Graph API to scrape 3,806 SGAG Facebook posts. A sample of the content can be seen below:

status_id status_message status_type status_link status_published num_reactions num_comments
378167172198277_1975245405823771 We all know someone who takes a lot of sick leaves _Ù÷â photo https://www.facebook.com/sgag.sg/photos/a.378177495530578.106131.378167172198277/1975244575823854/?type=3 29/11/17 6:00 279 16
num_shares num_likes num_loves num_wows num_hahas num_sads num_angrys
73 197 0 1 80 1 0

In addition, we would be creating a new feature, the number of positive reactions. This is defined as the sum of total number of ‘likes’, number of ‘loves’, number of ‘wows’ and number of ‘hahas’.

Facebook Comments

Next, we scraped 21,940 SGAG Facebook comments and a sample of the content can be seen below:

comment_id status_id parent_id comment_message comment_author comment_published comment_likes
1975245405823771_1980256198656025 378167172198277_1975245405823771 "Boss, I just got into an accident and broke my arm, fractured a rib, and I might have internal bleeding"

Boss: "Ok la. So what time you coming into the office later?"

Leorenzo Joseph 29/11/17 6:06 5

YouTube

Using an online web scraper, we scraped the first 10,000 comments for Night Owl Cinematic and TheSmartLocal's YouTube videos. In addition, we also scarped the first 1,000 comments for SGAG YouTube videos. A sample of the content can be seen below:

id user date commentText likes hasReplies numberOfReplies
UgzkUyiEd5tMlCq4Nwh4AaABAg Frentzen 29 minutes ago Single better 0 FALSE 0

Data Cleaning

In progress


Understanding Data

SGAG Facebook Posts

  • The number of comments is drastically lesser than the number of reactions, which shows that the consumers have minimal engagement with the content.
  • Consumers generate a small number of negative reactions, which is consistent with the company's mission - to generate positive content.
  • Identified that content generate more reactions when SGAG generates contents that ride on the hype, i.e. Pokemon Go.
  • The number of user engagement is higher for videos as compared to memes, probably due to the nature of the video, which has more information and content.

SGAG Facebook Comments

  • Identified that authors who made a well-liked comment (high number of likes) are generally social influencers with high levels of degree of centrality.
  • Generally, commentators that reply the most often to the comments that were made in the SGAG's Facebook post are quite dispersed. However, there are a few notable commentators that have replied more as compared to the rest and this would be crucial for SGAG to determine the network strength.

YouTube Comments

  • SGAG's video performance in terms of the number of replies is comparable to TheSmartLocals and Night Owl Cinematics.
  • However, SGAG falls short in terms of the number of likes for the comment received. TheSmartLocal has the highest number of likes, followed by Night Owl Cinematics.
  • SGAG is relatively less active in commenting and replying their youtube videos.


Methodologies

Predicting performance of content

Dashboarding

To allow SGAG to better predict the performance of their content, we would firstly need to allow SGAG to understand their current performance. To do so, we would be creating a summary page/dashboard that clearly summarizes key performance indicators. They are as follows:

  • We would perform Sentiment Analysis on the first 1,000 SGAG Facebook comments and report summary statistics of these sentiment scores.
  • Using SGAG’s Facebook comments, we would analyse SGAG’s network and provide degree centrality measures. This would be done via a 2 degree egocentric directed network, with the number of “likes” each comment receives as the weightage of each edge/edge attribute.
  • Finally, we would provide a list of summary statistics for centrality tendency measures. The features that we would be summarizing are the number of likes each comment receives, the timing of comment posts as a categorical variable, number of comments, shares, reactions, positive reactions for SGAG’s Facebook posts. Finally, we would bin the timing of content posts as a categorical variable and understand its corresponding performance.


Document clustering

Based off the scraped comments, we would cluster them based off document clustering via k means clustering. We would then perform topic modelling within each cluster to better understand the different clusters. Next, we would note the distribution of the number of positive reactions in each cluster. Finally, we would use ANOVA or z-test to determine if the clusters do differ in terms of the number of positive reactions.

Through understanding this, SGAG would then be able to know what kind of content would generate the most number of positive reactions. This would also allow SGAG to understand if the generated content is having the desired effects on their consumers.


Overall topic modelling and understanding performance of specific topics

Next, we would perform topic modelling on the scraped comments via Latent Dirichlet Allocation (LDA). We would then pick prominent and relevant topics from the LDA models and zoom into comments that talk about such topics. Within these individual topics itself, we would perform sentiment scoring and obtain the summary statistics of these sentiment scores. We would repeat the above analysis for Facebook posts that makes up the top 10% of user engagements to understand what drives the performance of top performing Facebook posts.

Such an analysis would allow SGAG to better understand which aspects of their content are doing well and which are not. We would then be able to recommend to SGAG which topics that their content team should focus on and create high-level guidelines for SGAG to drive the performance of their content.


Multi-linear regression analysis

In progress.


K-means clustering analysis We would be clustering the performance of Facebook posts via K means clustering. The variables that would be considered for clustering process would be as as follows: Status_type, Status_published, Num_reactions, Num_likes, Num_loves, Num_wows, Num_hahas, Num_sads, Num_angrys, Number of positive reactions, Num_comments and Num_share. Next, we would conduct z-score profiling on the various clusters to create a meaningful interpretation to come up with a business recommendation for SGAG.

The aim of this clustering exercise is twofold, namely to allow SGAG to better understand their customers and posts, and to create relevant business recommendations to drive performance of future SGAG content.


Proposed Deliverables

In progress.