AY1516 T2 Team SkyTrek Methodology

From Analytics Practicum
Jump to navigation Jump to search

HOME

OVERVIEW

ANALYSIS

PROJECT MANAGEMENT

DOCUMENTATION

Project Description Data Methodology


Overview

The following table demonstrates the analytical methods proposed for use, in order to achieve our objectives for this practicum.

Objective Analytical Method(s)
Identify the different web content factors that affect content performance in order to differentiate between high and low performing content
  • Multiple Linear Regression on Article Characteristics
Facilitate the content planning process by way of an interactive dashboard
  • Data Visualization
  • Google Trends Analysis
  • Content Themes Analysis

Multiple Linear Regression on Article Characteristics

Based on the merged dataset comprising of attributes from Google Analytics and article attributes scraped directly from the new articles, we will be performing multiple linear regression (MLR) to determine key attributes affecting the number of unique page views.

We will be exploring the following dependent variables in predicting the number of unique page views:

Independent Variable Intuition for Selection

No. of words (stopwords removed)

This measure serves as an indicator of the length of the article. Recognising that readers have a limited attention span, it would be interesting to explore the effect of a lengthy article on its popularity.

No. of outbound links references

Outbound links typically direct readers to more in-depth content. An article with more links might be indicative of more meaningful content, which might translate to greater popularity and better reception amongst its readers.

No. of images
No. of videos

Images and videos make for a more interactive experience with the reader. It might be an important determinant in an article’s receptivity.

No. of article shares

The intuition is that people share articles that are useful and impactful. Number of article shares is expected to have a positive correlation with the number of unique page views. It would be of interest to assess its importance, hence making an assessment of the importance of social media as a platform of publicity in comparison to other platforms.

Bounce rate
(Percentage of sessions that starting with the page (out of all the other tracked skyscanner pages) where the reader leaves after visiting the page (i.e. one page views))

Exit %
(Percentage of sessions involving the page where the reader leaves after reading the page)

Readers arriving at Skyscanner’s news pages are expected to be browsing for information related to a particular destination or related travel content. Since Skyscanner articles are light (bit-sized) reads, we would expect readers to continue browsing other relevant articles via the recommendation engine or the outbound links within the articles themselves.Nevertheless, there will bound to be a point where readers finally exit the site. Hence, we are expecting to see an average bounce rate and exit% rating across the articles. Articles with particularly high ratings would serve as good negative-subjects of study for future reference.


Average time on page

Time spent on a page is expected to be indicative of interest levels in an article and possibly the number of unique page views. It would be interesting to validate if time spent is a predictor of unique page views. If so, we could also consider study articles with long average times to identify good articles.

Understanding key dependent variables which influence the value of the unique page views will help in the creation of content which have greater tendency of receiving higher page views.


Google Trends Analysis

In planning the content for the upcoming quarter, the content management team typically uses Google Trends to understand consumer trends in both past similar quarters as well as the present. They would also consider the present context of festivities and events. A word cloud of Google trends relevant to each quarter will help incorporate these trends into the content planning process.

This will tie up with our exploration of seasonality and the effect of external events on the content readership. While Google Trends does not have an API, the data can be scraped through manipulation of the URL. This trend data will be aggregated and put into word cloud and put side by side with the quarterly patterns of the different Google Analytics metrics in order to gain a better understanding of seasonality.

SkyTrek ga trend.png

Content Themes Analysis

Skyscanner has identified 7 content themes articles typically belong to. Operating on a lean workforce, it would be helpful to be able to identify which of the 7 content themes reaps the greatest yield. Here, we define yield by the metrics Google analytics tracks. They are the number of unique page views, bounce rate and exit %, as well as the average time spent on page. This will be done via Text Miner by SAS.

Text Miner can generate a number of topics. Each topic will be associated with a set of representative keywords derived from the corpus of articles input to the algorithm. Each article would have a probability rating of belonging to a particular topic. We would tag the topic with the highest probability rating to the article. We would then manually examine the keywords representative of the topic, then classify the topics according to the 7 content themes. Having classified the articles into the 7 content themes, we can now analyse them with the google analytics metrics, thereby identifying popular content themes as an area of focus.

Data Visualization

Unique Page Views Exploration

SkyTrek unique pgview 1.png

SkyTrek unique pgview 2.png

Heat Map of Traffic Source (Country Specific New Page)