ANLY482 AY2016-17 T1 Group1: PROJECT OVERVIEW/ Methodology
As mentioned previously in our proposal, Cluster Analysis and Sentiment Analysis have been chosen as our main methodologies used in analyzing the dataset. However, due to the time constraint, our team will be focusing more on understanding the behaviour and factors contributing to popular Facebook posts and thus, will not attempt to interpret the sentiments of SGAG’s Facebook comments and Twitter tweets through Sentiment Analysis and Text Mining. Going forward, our team will be focusing on using classification models such as cluster analysis and latent analysis to understand the behaviour of SGAG Facebook posts.
Our team will attempt to use K-Means Clustering to find out the characteristics of SGAG’s Facebook posts that perform similarly. Firstly, Cluster Analysis provides us a more dynamic way of classifying SGAG Facebook posts. Cluster Analysis allows us to use different attributes to group the Facebook posts and this will give us a more comprehensive grouping of the posts as it is not just based on a single performance indicator, but many different attributes.
Secondly, by using K-Means Clustering, we will have the flexibility of experimenting with different K-Values. This gives us the ability to find out the optimal number of clusters that can best describe the performance of SGAG’s Facebook posts. In this Cluster Analysis, our team will attempt to examine the behaviour of SGAG Facebook posts at both the general post level itself and at the specific video level. Thereafter, we will attempt to examine the reasons affecting the performance of each cluster.
As much as Cluster Analysis is useful in helping us to classify SGAG’s Facebook posts, there is a limitation to Cluster Analysis as well. A noteworthy restriction of Cluster Analysis is that it can only accommodate continuous variables. Nevertheless, there are several categorical attributes in our dataset that may be useful in classifying SGAG’s Facebook posts or videos. As a result, our team will attempt to make use of Latent Analysis which allows us to leverage on the categorical variables of our dataset in describing the groupings of SGAG’s Facebook posts.
While Cluster Analysis finds cluster using distance measure such as Euclidean distance between two objects, Latent Analysis attempts to use a model to describe the distribution of our dataset and assesses the probability of each object belonging to certain group. It is a more Top-down approach as compared to Cluster Analysis. In addition, Latent Analysis also captures more uncertainties in the process of classifying the posts as it does not categorize each posts by group but rather, gives us probabilities of each post belonging to each groups.
For the different datasets, our team will firstly examine the types of variables that will be useful in classifying the particular dataset. Then, we will attempt to classify the dataset objects using either Cluster Analysis or Latent Analysis for each of the datasets. By obtaining the different clusters through these two classification models, our team will be able to assist SGAG in reviewing the performance of the different type of posts and improving the quality of their posts.