Difference between revisions of "ANLY482 AY2016-17 T1 Group1: PROJECT FINDINGS/Post"

From Analytics Practicum
Jump to navigation Jump to search
Line 78: Line 78:
 
<div><font face="Roboto">
 
<div><font face="Roboto">
 
'''Problem Definition'''
 
'''Problem Definition'''
 +
 
In measuring the performance of SGAG’s Facebook posts, the reach of the post which is defined as the total number of users who have seen a particular post is commonly used among the social media content creators. Nevertheless, it is challenging for them to determine the various components that may possibly affect this KPI with merely human judgement. Thus, an attempt to utilize the multiple linear regression analysis in examining these factors influencing the KPI will be discussed in the following section.
 
In measuring the performance of SGAG’s Facebook posts, the reach of the post which is defined as the total number of users who have seen a particular post is commonly used among the social media content creators. Nevertheless, it is challenging for them to determine the various components that may possibly affect this KPI with merely human judgement. Thus, an attempt to utilize the multiple linear regression analysis in examining these factors influencing the KPI will be discussed in the following section.
  
 
'''Analysis Methodology'''
 
'''Analysis Methodology'''
 +
 
As mentioned previously, Multiple Linear Regressions can be developed using the Fit Model platform of JMP Pro by selecting the response variable (Y axis) which is Post’s reach (Lifetime Organic Post Reach) and the explanatory variables (X axis) which comprises of various continuous variables such as Main Engagement and Negative Feedbacks. Categorical variables will be added to the model once multicollinearity is eliminated from the model.
 
As mentioned previously, Multiple Linear Regressions can be developed using the Fit Model platform of JMP Pro by selecting the response variable (Y axis) which is Post’s reach (Lifetime Organic Post Reach) and the explanatory variables (X axis) which comprises of various continuous variables such as Main Engagement and Negative Feedbacks. Categorical variables will be added to the model once multicollinearity is eliminated from the model.
 
[[File:ANLY482_Group1_Figure2.png|400px|center]]  
 
[[File:ANLY482_Group1_Figure2.png|400px|center]]  
Figure 2 - Initial Post KPI model variables selection
+
[[File:ANLY482_Group1_Figure3.png|300px|center]]  
[[File:ANLY482_Group1_Figure3.png|400px|center]]  
 
Figure 3 - Initial Fit Model Result
 
  
 
As seen from Figure 3, 97% of variability of the response variable can be explained by the explanatory variables. High R-Square indicates high accuracy of the model. Nonetheless, multicollinearity will need to firstly be examined in order to attain the true factors affecting the KPI.
 
As seen from Figure 3, 97% of variability of the response variable can be explained by the explanatory variables. High R-Square indicates high accuracy of the model. Nonetheless, multicollinearity will need to firstly be examined in order to attain the true factors affecting the KPI.
 
[[File:ANLY482_Group1_Figure4.png|400px|center]]  
 
[[File:ANLY482_Group1_Figure4.png|400px|center]]  
Figure 4 - Initial Parameter Estimates Report
 
  
 
From Figure 4 above, the highlighted columns such as Lifetime Engaged Users and Hide Clicks per thousand user indicate variables with VIF of larger than 8. These variables will be re-selected through the use of variables clustering analysis to eliminate the multicollinearity.
 
From Figure 4 above, the highlighted columns such as Lifetime Engaged Users and Hide Clicks per thousand user indicate variables with VIF of larger than 8. These variables will be re-selected through the use of variables clustering analysis to eliminate the multicollinearity.
  
 
[[File:ANLY482_Group1_Figure5.png|400px|center]]  
 
[[File:ANLY482_Group1_Figure5.png|400px|center]]  
Figure 5 - Variable Clustering-Variables Selection
 
 
[[File:ANLY482_Group1_Figure6.png|400px|center]]  
 
[[File:ANLY482_Group1_Figure6.png|400px|center]]  
Figure 6 - Result of Variable Clustering
 
  
 +
As shown in Figure 6, there were two clusters formed and the representative variables are Main Engagement and No of negative feedback per thousand users. Hence, other variables will be removed from the model and the model will be re-run and further evaluated.
 +
 +
[[File:ANLY482_Group1_Figure7.png|400px|center]]
 +
 +
From Figure 7, the remaining independent variables are those with VIF < 8. After eliminating multicollinearity from the model, we will now attempt to filter insignificant factors from this model. As explained previously, variables with p-value (Prob>|t|) of larger than 0.05 are considered as insignificant and the highlighted factor (Post Message Length) have a p-value of 0.18. Therefore, will be removed from the model to improve the true accuracy of the model. Afterwhich, categorical variables such as type and public holiday will also be included to the model.
 +
 +
[[File:ANLY482_Group1_Figure9.png|300px|center]]
 +
[[File:ANLY482_Group1_Figure10.png|400px|center]]
 +
 +
The final model and its reports can be seen from Figure 9 and 10. In this model, 89.5% of the variability of the response variable (Post Reach) can be explained by the various independent variables (Main Engagement, Number of Negative Feedbacks per thousand user, Hide All Clicks per thousand user, Unlike Page per thousand user, Type). The VIF of the explanatory variables are lesser than 8, indicating no multicollinearity exists and p-value of all variables are lesser than 0.0001 indicating strong it is a strong model.
 +
 +
Equation representing the relationship between the dependent variable and the independent variables can be written as such:
 +
 +
'''Post Reach''' = '''6.60''' (intercept) + '''0.27''' ( Log[Main Engagement] ) + '''0.05''' ( Log[No of negative feedback per thousand user] ) - '''0.17''' ( Log[Hide All Clicks per thousand user] ) - '''0.58''' ( Log[Unlike Page per thousand user] )+ Match( Type )("Link" → '''-0.05''', "Photo" → '''- 0.16''', "Video" → '''0.21''')
 +
 +
[[File:ANLY482_Group1_Figure11.png|400px|center]]
  
 +
To further examine the effects of the independent variables on the dependent variable, the equation which consists of values which were previously transformed by logarithm function is converted back to the original values through the use of exponential function. Through the profiler seen in Figure 11, we will be able to see the degree of linearity of each independent variable towards the dependent variable, as well as the degree of sensitivity of the response variable to the adjustments of the explanatory variables.
  
 
</font></div>
 
</font></div>

Revision as of 11:14, 29 November 2016

HOME

 

ABOUT US

 

PROJECT OVERVIEW

 

PROJECT FINDINGS

 

PROJECT MANAGEMENT

 

DOCUMENTATION

POST LEVEL ANALYSIS

Reach by Post Type

ANLY482 Group1 Figure4 4.png

From Figure 4.4 above, we can see that on average, video posts generated notably better performance as compared to the other media type. This observation still holds for most industries even when we drill down the analysis to individual advertiser’s industry level (Appendix 1).

Comparison of the Performance of Paid and Unpaid Post

ANLY482 Group1 Figure4 5.png

By looking at post reach of paid and unpaid posts, unpaid posts generally perform better as compared to paid posts. Unpaid post performance is 14.93% better than paid post performance. This may be due to the nature of paid post being more relatable and humorous. Hence, SGAG may consider to craft paid posts in such manner to help in generating more reach for paid posts.

Reach of Paid Post by Industry

ANLY482 Group1 Figure4 6.png
ANLY482 Group1 Figure4 7.png

From Figure 4.6, the top 3 best performing advertiser’s industries are Gaming, FMCG and Real Estate. However, upon further investigation, some of the advertiser industries do not actually place a lot of advertisements with SGAG. For instance, the top performing industry which is Gaming, only contain 1 advertiser. As such, these industries may have high performance due to its low number of advertisers and advertisements. To better gauge the performance of the advertiser’s industries, we excluded industries which comprise of only 1 advertiser and as seen in Figure 4.7, the result was significantly different and the top 3 best performing industry will then be FMCG, Entertainment and F&B.

Top Posts with Most Reach

ANLY482 Group1 Figure4 8.png
ANLY482 Group1 Figure4 9.png

Based on our discussion with SGAG, Reach is an important factor in deciding the performance of a post. Higher Reach indicates that the post is seen by a larger audience base. Hence, examining the top posts with most reach will allow us to know the different types of post that attracts the most audience. Figure 4.8 illustrates the Top Reach generated by SGAG Facebook posts over the past year. The top performing post reaches over 4 Million audiences which is 8 times the total likes of SGAG Facebook page. Meanwhile, Figure 4.9 shows us the different type of posts that generated the highest reach. These posts are relevant to the current happening events or trends, as well as related to the pride of Singapore.

Top Posts with Most Engagements

ANLY482 Group1 Figure4 10.png
ANLY482 Group1 Figure4 11.png

Posts with most engagements are posts that generate discussion amongst the audience, these posts spark the interests of the audience, such that people keep interacting with the posts and share it to their friends. Engagements allow the post to reach out to the friends of people that are interacting with the posts and thus, will potentially generates higher reach. Hence, engagements of a post are also an important indicator of the post’s performance. Figure 4.10 depicts the posts with top engagements, while Figure 4.11 represents the instances of such posts. While the 3 posts shown in Figure 4.11 are of different topics, a noteworthy observation is that all of the 3 posts are considered as humorous posts.

Posts with Most Negative Feedbacks

ANLY482 Group1 Figure4 12.png
ANLY482 Group1 Figure4 13.png

As seen in Figure 4.12, the amount of negative feedbacks received from SGAG’s audience are not very significant, most of the posts received lesser than 100 negative feedbacks and the post with most negative feedbacks received 237 feedbacks. While the figure of negative feedbacks is not as noticeable as compared to the number of engagements and reach that SGAG have, posts with negative feedbacks are able to show SGAG the type of posts that people dislike and will be able to help them in deciding what kind of Facebook post to craft in the future. From Figure 4.13, the post with most negative feedbacks features a picture that is perceived as indecent, while it is actually a dog cartoon character. The reason of why this specific post generated highest negative feedbacks among all the posts this past year may be due to the preview picture deemed inappropriate to be shown in social media platform like Facebook as kids of any age will be able to enter the social media platform easily. It is important to note that the next two posts with most negative feedbacks are also posts with relatively high reach and engagements. This is because popular posts appear frequently on people’s timeline and some may find it annoying and repeating. Hence, decided to hide it away from their timeline. As for the last post on Figure 4.13, while most people find it funny, others may see the video as disrespectful act towards our Prime Minister, Mr Lee Hsien Loong, this results in higher negative feedbacks in this post.


EVALUATION OF FACTORS AFFECTING FACEBOOK POST KPI USING MULTIPLE LINEAR REGRESSION

Problem Definition

In measuring the performance of SGAG’s Facebook posts, the reach of the post which is defined as the total number of users who have seen a particular post is commonly used among the social media content creators. Nevertheless, it is challenging for them to determine the various components that may possibly affect this KPI with merely human judgement. Thus, an attempt to utilize the multiple linear regression analysis in examining these factors influencing the KPI will be discussed in the following section.

Analysis Methodology

As mentioned previously, Multiple Linear Regressions can be developed using the Fit Model platform of JMP Pro by selecting the response variable (Y axis) which is Post’s reach (Lifetime Organic Post Reach) and the explanatory variables (X axis) which comprises of various continuous variables such as Main Engagement and Negative Feedbacks. Categorical variables will be added to the model once multicollinearity is eliminated from the model.

ANLY482 Group1 Figure2.png
ANLY482 Group1 Figure3.png

As seen from Figure 3, 97% of variability of the response variable can be explained by the explanatory variables. High R-Square indicates high accuracy of the model. Nonetheless, multicollinearity will need to firstly be examined in order to attain the true factors affecting the KPI.

ANLY482 Group1 Figure4.png

From Figure 4 above, the highlighted columns such as Lifetime Engaged Users and Hide Clicks per thousand user indicate variables with VIF of larger than 8. These variables will be re-selected through the use of variables clustering analysis to eliminate the multicollinearity.

ANLY482 Group1 Figure5.png
ANLY482 Group1 Figure6.png

As shown in Figure 6, there were two clusters formed and the representative variables are Main Engagement and No of negative feedback per thousand users. Hence, other variables will be removed from the model and the model will be re-run and further evaluated.

ANLY482 Group1 Figure7.png

From Figure 7, the remaining independent variables are those with VIF < 8. After eliminating multicollinearity from the model, we will now attempt to filter insignificant factors from this model. As explained previously, variables with p-value (Prob>|t|) of larger than 0.05 are considered as insignificant and the highlighted factor (Post Message Length) have a p-value of 0.18. Therefore, will be removed from the model to improve the true accuracy of the model. Afterwhich, categorical variables such as type and public holiday will also be included to the model.

ANLY482 Group1 Figure9.png
ANLY482 Group1 Figure10.png

The final model and its reports can be seen from Figure 9 and 10. In this model, 89.5% of the variability of the response variable (Post Reach) can be explained by the various independent variables (Main Engagement, Number of Negative Feedbacks per thousand user, Hide All Clicks per thousand user, Unlike Page per thousand user, Type). The VIF of the explanatory variables are lesser than 8, indicating no multicollinearity exists and p-value of all variables are lesser than 0.0001 indicating strong it is a strong model.

Equation representing the relationship between the dependent variable and the independent variables can be written as such:

Post Reach = 6.60 (intercept) + 0.27 ( Log[Main Engagement] ) + 0.05 ( Log[No of negative feedback per thousand user] ) - 0.17 ( Log[Hide All Clicks per thousand user] ) - 0.58 ( Log[Unlike Page per thousand user] )+ Match( Type )("Link" → -0.05, "Photo" → - 0.16, "Video" → 0.21)

ANLY482 Group1 Figure11.png

To further examine the effects of the independent variables on the dependent variable, the equation which consists of values which were previously transformed by logarithm function is converted back to the original values through the use of exponential function. Through the profiler seen in Figure 11, we will be able to see the degree of linearity of each independent variable towards the dependent variable, as well as the degree of sensitivity of the response variable to the adjustments of the explanatory variables.