Group04 Project Findings
GROUP4 |
HOMEPAGE | PROJECT OVERVIEW | PROJECT FINDINGS | PROJECT MANAGEMENT | DOCUMENTATION | ANALY482 MAIN |
PROPOSAL | INTERIM | FINAL |
---|
Contents
Data
Data Scraping
To fulfil the objectives mentioned, we scraped data from platforms whereby SGAG have a strong presence on, namely Facebook and YouTube.
Facebook Posts
We used Facebook's Graph API to scrape 3,806 SGAG Facebook posts. A sample of the content can be seen below:
status_id | status_message | status_type | status_link | status_published | num_reactions | num_comments |
---|---|---|---|---|---|---|
378167172198277_1975245405823771 | We all know someone who takes a lot of sick leaves _Ù÷â | photo | https://www.facebook.com/sgag.sg/photos/a.378177495530578.106131.378167172198277/1975244575823854/?type=3 | 29/11/17 6:00 | 279 | 16 |
num_shares | num_likes | num_loves | num_wows | num_hahas | num_sads | num_angrys |
---|---|---|---|---|---|---|
73 | 197 | 0 | 1 | 80 | 1 | 0 |
In addition, we would be creating a new feature, the number of positive reactions. This is defined as the sum of total number of ‘likes’, number of ‘loves’, number of ‘wows’ and number of ‘hahas’.
Facebook Comments
Next, we scraped 21,940 SGAG Facebook comments and a sample of the content can be seen below:
comment_id | status_id | parent_id | comment_message | comment_author | comment_published | comment_likes |
---|---|---|---|---|---|---|
1975245405823771_1980256198656025 | 378167172198277_1975245405823771 | "Boss, I just got into an accident and broke my arm, fractured a rib, and I might have internal bleeding"
Boss: "Ok la. So what time you coming into the office later?" |
Leorenzo Joseph | 29/11/17 6:06 | 5 |
YouTube
Using an online web scraper, we scraped the first 10,000 comments for Night Owl Cinematic and TheSmartLocal's YouTube videos. In addition, we also scarped the first 1,000 comments for SGAG YouTube videos. A sample of the content can be seen below:
id | user | date | commentText | likes | hasReplies | numberOfReplies |
---|---|---|---|---|---|---|
UgzkUyiEd5tMlCq4Nwh4AaABAg | Frentzen | 29 minutes ago | Single better | 0 | FALSE | 0 |
Data Cleaning
In progress
Understanding Data
SGAG Facebook Posts
- The number of comments is drastically lesser than the number of reactions, which shows that the consumers have minimal engagement with the content.
- Consumers generate a small number of negative reactions, which is consistent with the company's mission - to generate positive content.
- Identified that content generate more reactions when SGAG generates contents that ride on the hype, i.e. Pokemon Go.
- The number of user engagement is higher for videos as compared to memes, probably due to the nature of the video, which has more information and content.
SGAG Facebook Comments
- Identified that authors who made a well-liked comment (high number of likes) are generally social influencers with high levels of degree of centrality.
- Generally, commentators that reply the most often to the comments that were made in the SGAG's Facebook post are quite dispersed. However, there are a few notable commentators that have replied more as compared to the rest and this would be crucial for SGAG to determine the network strength.
Youtube SGAG & Competitors
- SGAG's video performance in terms of the number of replies is comparable to TheSmartLocals and Night Owl Cinematics.
- However, SGAG falls short in terms of the number of likes for the comment received. TheSmartLocal has the highest number of likes, followed by Night Owl Cinematics.
- SGAG is relatively less active in commenting and replying their youtube videos.
Methodologies
Predicting performance of content
Dashboarding To allow SGAG to better predict the performance of their content, we would firstly need to allow SGAG to understand their current performance. To do so, we would be creating a summary page/dashboard that clearly summarizes key performance indicators. They are as follows:
- We would perform Sentiment Analysis on the first 1,000 SGAG Facebook comments and report summary statistics of these sentiment scores.
- Using SGAG’s Facebook comments, we would analyse SGAG’s network and provide degree centrality measures. This would be done via a 2 degree egocentric directed network, with the number of “likes” each comment receives as the weightage of each edge/edge attribute.
- Finally, we would provide a list of summary statistics for centrality tendency measures. The features that we would be summarizing are the number of likes each comment receives, the timing of comment posts as a categorical variable, number of comments, shares, reactions, positive reactions for SGAG’s Facebook posts. Finally, we would bin the timing of content posts as a categorical variable and understand its corresponding performance.
Document Clustering Based off the scraped comments, we would cluster them based off document clustering via k means clustering. We would then perform topic modelling within each cluster to better understand the different clusters. Next, we would note the distribution of the number of positive reactions in each cluster. Finally, we would use ANOVA or z-test to determine if the clusters do differ in terms of the number of positive reactions.
Through understanding this, SGAG would then be able to know what kind of content would generate the most number of positive reactions. This would also allow SGAG to understand if the generated content is having the desired effects on their consumers.
In progress.
Proposed Deliverables
In progress.