ANLY482 Team wiki: 2015T2 TeamROLL Data Analysis

From Analytics Practicum
Jump to navigation Jump to search
T(eam)ROLL.png

Teamroll home.png   HOME

 

Teamroll.png   ABOUT US

 

Teamroll overview.png   PROJECT OVERVIEW

 

Teamroll this.png   DATA ANALYSIS

 

Teamroll mgmt.png   PROJECT MANAGEMENT

 

Teamroll doc.png   DOCUMENTATION

Data Cleaning Page-Level Analysis Post-Level Analysis

Data Collection

Comparing the different number of followers for SGAG’s various social media platforms (namely Facebook, Twitter, Instagram and Youtube), we have noted that the majority of their followers, with up to 53.2%, are Facebook users. We have thus, decide to narrow our project scope to SGAG’s Facebook data.
Subsequently, the team gathered SGAG’s lifetime data (years 2012 to 2016) from facebook insights. However, for our analysis, we have selected page and post data for the year 2015. Since the raw data extracted included a wide number of variables, the team have selected only relevant variables that will be used for our analysis.

Data Cleaning Log

Teamroll log1.png

Data Cleaning Log

Page Data

Teamroll clean1.png

Attribute Selection
We have discussed the scope of our analysis with our project sponsor which resulted in the following selected attributes for analysis:

Teamroll clean2.png

We have added a column titled ‘Engaged Users’ to identify the proportion of users who have engaged with the page over the number of users who have seen content publish from the page. This allow us to identify the day and post that generated higher engagement level.
Other attributes were removed because:
1) Focus on net rather than organic metrics
2) Redundancy of paid posts as no posts were paid for
3) Redundancy of check-in metrics since this was not a hotel or destination services business
4) Omission of video, link and status post types formats for current analysis which is limited to pictorial posts

Merge excel worksheets into one single file
Using Microsoft Excel, we merged the various excel files into a single worksheet.

Post Data

Similarly, the post-level data was exported in batches of excel files. Due to the larger number of observations recorded, data preparation for post level data was more complicated.

Teamroll clean3.png

Post level data extracted from Facebook Insights resulted in a combination of data sheets each recording different aspects of post-performance. These different metrics are identified for individual posts via unique Post IDs.
Attributes Selection
Based on our business and analytical objectives, we have narrowed the large number of tabs and columns and selected the following:

Teamroll clean4.png

Similar to page level attributes selection, other attributes were removed from analysis after discussion with our project sponsor. The main reasons for doing so would also be:
1) Focus on net rather than organic metrics
2) Redundancy of paid posts as no posts were paid for
3) Omission of video formats for current analysis which is limited to pictorial posts
4) Only direct engagement response metrics number of the number of "like" and lifetime negative feedback to identify weak and good performing posts.

Columns Creation
Implementing the tagging framework, we have added the “Tags” column to specify the topics related to the post.
For design attributes, once again in consultation with SGAG, we identified three main areas of design, namely
1) Character used: Animals, Local Characters, Foreign Celebs, Troll faces/Memes, Movie Characters, Politicians
2) Number of frames; indicated with either "1", "2" "3" or ">3", since most of the posts were designed to attempt to fall within three or less frames
3) Number of description lines within the picture; indicated with either "1", "2", "3" or ">3".

Recode Date Posted to Singapore Time
The team noticed that the attributed "Posted" which recorded the date and time of post release contained a large number of posts which were released between 000h-0600h daily. This is strange as these are the sleeping hours of Singaporeans and would thus make no sense for anyone to be releasing the posts so late at night. Through online research, we found that Facebook Insights recorded "Posted" according to Pacific Time rather than local time. As such, there was a need to recode "Posted" forward by 16 hours to match with local SG time. This was done by adding 16 hours ( =+0.667) to the previous recorded time. With a check on the newly calculated local time, majority of posts were released within the expected timings of 0900-2100h in local time.

Check for missing entries
The team examined the data for missing values and found some observations with missing values for performance attributes. The cause for these missing values is not known, though we suspect the cause to be another limitation in Facebook Insights attribute retrieval for specific types of posts, such as linked posts. However, the number of such missing values is very small, comprising around 2% of our dataset. As such, we have decided to omit these missing values during our analysis. With the above processes in place, we now have our completed Post-Level analytical data cube ready to be used for data analysis.

Topic Modelling

We used SAS Enterprise Miner's Text Mining tools to analyse the topics present in SGAG's post throughout 2015. The steps to topic mine can be found in the SAS EM diagram below:

Teamroll clean5.png

First, we parsed the text tags, followed by text filtering and lastly text topic modelling. The text topic nodes were used twice, firstly to identify overarching topics, and the second time to build user-defined sub topics. Some examples of the latter included "Commerce" related topics such as "scoot" and "carousell".

The interactive topic viewer was used to do a brief check on the posts' topic classification. After which, the posts and their identified topics were saved to an excel spreadsheet. Topics which had <50 posts in their category were removed from the list as we felt these could be considered largely insignificant topics. However, we retained some of these topics, namely "National Service", "Police" and "Foreign Talents/Worker" as these were more relevant to SGAG's content creation framework.