Difference between revisions of "ANLY482 Team wiki: 2015T2 TeamROLL Project Overview/Description"

From Analytics Practicum
Jump to navigation Jump to search
(Created page with "<!--Logo--> <div style="padding-bottom:25px;"> 350px|center </div> <!--Header Start--> {|style="background-color:#F5F5F5; color:#ffffff; padding: 10 0...")
 
Line 48: Line 48:
  
 
|}
 
|}
 +
<br>
 +
==<div style="background: #2196F3; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size:24px; border-left: #0D47A1 solid 32px;"><font color="white">Topic Modeling</font></div>==
 +
The focus of Topic Modeling is to sieve out prominent themes from our topic tags. The team aims to take reference to the work of Lai & To : Social Media Content Analysis, A Grounded Approach (Lai & To, 2015) to discover techniques on how to refine our topics for content analysis in SGAG's perspective. <br>
 +
With some of the suggested methodology in mind, we will meet with Prof. Kam shortly to discuss the methods used, and seek further advice on how they may be suitably applied in our project. A key technique likely to be used is text mining, with SAS Enterprise Miner being the main software tool.
 +
 +
==<div style="background: #2196F3; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size:24px; border-left: #0D47A1 solid 32px;"><font color="white">Cluster Analysis</font></div>==
 +
The team expects to use cluster analysis to segment and profile content posts according to their performance indicators. Despite preliminary attempts to use the k-means clustering method to segment our data, the team found that clustering results were less than ideal, since it frequently resulted in one large supercluster, and multiple small clusters. As such, we have decided to revert back to deeper exploratory analysis to better understand the dynamics of performance indicators in our dataset, to mitigate the distribution of these indicators before k-means clustering is attempted again. At the same time, we may explore other clustering methods, such as nearest neighbour or Wald's to identify outliers or anomalies in the dataset. Similarly, the team aims to meet Prof. Kam shortly to seek clarification on these methods so as to achieve better execution. The main analysis tool will also be SAS Enterprise Guide or JMP.<br>
 +
==<div style="background: #2196F3; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size:24px; border-left: #0D47A1 solid 32px;"><font color="white">Content Analysis and Regression Modeling</font></div>==
 +
With the above two analysis steps completed, the team will use findings derived from the above analysis as inputs for content analysis and regression modeling. Further discussion is required to clarify how content analysis and regression modeling is to be achieved and what further data preparation is required to do so. However, the team currently expects that cluster profiles will reflect certain prominent topics that contribute to the performance of such posts.

Revision as of 23:43, 6 March 2016

T(eam)ROLL.png

Teamroll home.png   HOME

 

Teamroll.png   ABOUT US

 

Teamroll this.png   PROJECT OVERVIEW

 

Teamroll analysis.png   DATA ANALYSIS

 

Teamroll mgmt.png   PROJECT MANAGEMENT

 

Teamroll doc.png   DOCUMENTATION

Description Methodology Technology Limitations


Topic Modeling

The focus of Topic Modeling is to sieve out prominent themes from our topic tags. The team aims to take reference to the work of Lai & To : Social Media Content Analysis, A Grounded Approach (Lai & To, 2015) to discover techniques on how to refine our topics for content analysis in SGAG's perspective.
With some of the suggested methodology in mind, we will meet with Prof. Kam shortly to discuss the methods used, and seek further advice on how they may be suitably applied in our project. A key technique likely to be used is text mining, with SAS Enterprise Miner being the main software tool.

Cluster Analysis

The team expects to use cluster analysis to segment and profile content posts according to their performance indicators. Despite preliminary attempts to use the k-means clustering method to segment our data, the team found that clustering results were less than ideal, since it frequently resulted in one large supercluster, and multiple small clusters. As such, we have decided to revert back to deeper exploratory analysis to better understand the dynamics of performance indicators in our dataset, to mitigate the distribution of these indicators before k-means clustering is attempted again. At the same time, we may explore other clustering methods, such as nearest neighbour or Wald's to identify outliers or anomalies in the dataset. Similarly, the team aims to meet Prof. Kam shortly to seek clarification on these methods so as to achieve better execution. The main analysis tool will also be SAS Enterprise Guide or JMP.

Content Analysis and Regression Modeling

With the above two analysis steps completed, the team will use findings derived from the above analysis as inputs for content analysis and regression modeling. Further discussion is required to clarify how content analysis and regression modeling is to be achieved and what further data preparation is required to do so. However, the team currently expects that cluster profiles will reflect certain prominent topics that contribute to the performance of such posts.