JAR v.IS Project Findings

From Analytics Practicum
Revision as of 16:05, 23 April 2017 by Albertb.2013 (talk | contribs)
Jump to navigation Jump to search

Click here to return to AY16/17 T2 Group List

Jarvis.png

HOME

 

PROJECT PROPOSAL

 

PROJECT FINDINGS

 

PROJECT MANAGEMENT

 

DOCUMENTATION

 

ABOUT US

Articles Videos R
Multiple Linear Regression Model

What makes a good Facebook post? This section outlines the explanatory model on the article dataset from Facebook Insights.

Response / Dependent Variables

We choose to make use of “Total Engagement” as the response/ dependent variable. “Total Engagement” for each post is the sum of the total number of reactions (like, love, wow, haha, angry, sad), comments and shares of that post as of the data retrieval date. Reactions are similar to the ‘likes’ on Facebook, but provides the additional option of reacting with five animated emoji rather than a simple ‘like’ reaction.


Other possible response variables include the comment sentiment score measures, and individual engagement metrics but they are ruled out due to reasons such as their non-normal distribution and utility for our sponsor.

Explanatory / Independent Variables

Article Dataset Metadata for Analysis
Header Description
Post Message Sentiment Crawled Variable: Sentiment Score calculated using PyCharm python script, AFINN Sentiment words and emoji package
Article Text Sentiment Derived Variable: Sentiment Score calculated using PyCharm python script, AFINN Sentiment words and emoji package
Number of Images Crawled Variable: Number of Images in the article
Number of Videos Crawled Variable: Number of Videos in the article
Number of Links Crawled Variable: The number of embedded links in the article
Number of syllables Crawled Variable: number of syllables within text
Word count Crawled Variable: Total word count
Sentence count Crawled Variable: Total sentence count
Words per Sentence Crawled Variable: Number of words/sentence in the body of text
Flesch reading ease Crawled Variable: Readability Index value of Flesch Reading Ease
Flesch kincaid grade Crawled Variable: Readability Index value of Flesch kincaid grade
Gunning fog Crawled Variable: Readability Index value of Gunning fog
Smog index Crawled Variable: Readability Index value of Smog index
Automated readability index Crawled Variable: Readability Index value of Automated readability index
Coleman liau index Crawled Variable: Readability Index value of Coleman liau index
Linsear write formula Crawled Variable: Readability Index value of Linsear write formula
Dale chall readability score Crawled Variable: Readability Index value of Dale chall readability score
Difficult words count Crawled Variable: Total count of difficult words
Article Category Crawled Variable: The categories of the article, 9 levels
Day of Week Derived Variable: The time of the day from the (adjusted) posted column of the article categorical 7 levels
Time Interval (Hour) Derived Variable: The time intervals of the articles derived from recursive splitting of the hour from the time of day column, to coincide with morning, afternoon, evening and night, categorical 4 levels
Article Authors Crawled Variable: The author of the article. Authors who wrote fewer than 9 articles are collectively grouped into others. Categorical 20 levels
Data Transformation / Excluding Outliers

Bivariate Fit

multi-collinearity

Stepwise Regression

Evaluation of Model Fit

Model Assumptions

Interpretation and Managerial insights