Difference between revisions of "IS428 2018 19T1 Group11 Proposal"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
 
(12 intermediate revisions by 2 users not shown)
Line 3: Line 3:
  
 
<!--MAIN HEADER -->
 
<!--MAIN HEADER -->
{|style="background-color:#009D3B;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
+
<span class="mw-ui-button {{#switch: {{{color|white}}} }}">
 +
[[Project_Groups|{{{Clickable Button|Back to Project Home}}}]]
 +
</span><noinclude>
  
| style="font-family:Century Gothic; font-size:100%; solid #009D3B ; border-bottom:0px solid #009D3B ; background:#009D3B ; text-align:center;" width="16.6%" |
+
{|style="background-color:#009D3B; color:#009D3B; padding: 10 0 10 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" |
[[IS428_2018_19T1_Group11_Home | <font color="#fff"><b>HOME PAGE</b></font>]]
 
| &nbsp;
 
  
| style="font-family:Century Gothic; font-size:100%; solid #009D3B ; border-bottom:0px solid #009D3B ; background:#009D3B ; text-align:center;" width="16.6%" |  
+
| style="padding:0.2em; font-size:100%; background-color:#009D3B; text-align:center; color:#F5F5F5" width="16.6%" |  
[[IS428_2018_19T1_Group11_Team | <font color="#fff"><b>TEAM</b></font>]]
+
[[IS428_2018_19T1_Group11_Team|<font color="#fff" face="Century Gothic"><b>HOME</b></font>]]
| &nbsp;
+
| style="background:none;" width="1%" | &nbsp;
  
| style="font-family:Century Gothic; font-size:100%; solid #009D3B  ; border-bottom:0px solid #009D3B  ; background:#bcbbbb; text-align:center;" width="16.6%" |  
+
| style="padding:0.2em; font-size:100%; background-color:#bcbbbb; text-align:center; color:#F5F5F5" width="16.6%" |  
<font color="#000"><b>PROPOSAL</b></font>
+
<font color="#000" face="Century Gothic"><b>PROPOSAL</b></font>
| &nbsp;
+
| style="background:none;" width="1%" | &nbsp;
  
| style="font-family:Century Gothic; font-size:100%; solid #009D3B ; border:0px solid #009D3B background:#009D3B ;  background:#009D3B ; text-align:center;" width="16.6%" |
+
| style="padding:0.2em; font-size:100%; background-color:#009D3B; text-align:center; color:#F5F5F5" width="16.6%" |  
[[IS428_2018_19T1_Group11_Poster | <font color="#fff"><b>POSTER</b></font>]]
+
[[IS428_2018_19T1_Group11_Poster|<font color="#fff" face="Century Gothic"><b>POSTER</b></font>]]
| &nbsp;
+
| style="background:none;" width="1%" | &nbsp;
  
| style="font-family:Century Gothic; font-size:100%; solid #009D3B ; border:0px solid #f48024 background:#009D3B ;  background:#009D3B ; text-align:center;" width="16.6%" |
+
| style="padding:0.2em; font-size:100%; background-color:#009D3B; text-align:center; color:#F5F5F5" width="16.6%" |  
[[IS428_2018_19T1_Group11_Application | <font color="#FFFFFF"><b>APPLICATION</b></font>]]
+
[[IS428_2018_19T1_Group11_Application|<font color="#fff" face="Century Gothic"><b>APPLICATION</b></font>]]
| &nbsp;
+
| style="background:none;" width="1%" | &nbsp;
  
| style="font-family:Century Gothic; font-size:100%; solid #009D3B  ; border:0px solid #009D3B  background:#009D3B ;  background:#009D3B ; text-align:center;" width="16.6%" |
+
| style="padding:0.2em; font-size:100%; background-color:#009D3B; text-align:center; color:#F5F5F5" width="16.6%" |  
[[IS428_2018_19T1_Group11_Report | <font color="#FFFFFF"><b>RESEARCH PAPER</b></font>]]
+
[[IS428_2018_19T1_Group11_Report|<font color="#fff" face="Century Gothic"><b>REPORT</b></font>]]
| &nbsp;
+
| style="background:none;" width="1%" | &nbsp;
 
|}
 
|}
  
[[IS428_2018_19T1_Group11_Proposal | <font color="#000"><b>Version 1</b></font>]] | [[IS428_2018_19T1_Group11_Proposal-version2 | <font color="#000">Version 2</font>]]
+
[[Grab C.H.A | <font color="#000"><b>Version 1</b></font>]] | [[IS428_2018_19T1_Group11_Proposal-version2 | <font color="#000">Version 2</font>]]
 
<br>
 
<br>
  
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">INTRODUCTION</font></div>
+
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">PROBLEM AND MOTIVATION</font></div>
 
+
<br>
Write something here
+
Despite Grab’s strong presence within the South East Asian rideshare market following the acquisition of Uber in March of this year, the growing number of players within the different fields in which Grab operates in incentivises the company to adopt non-traditional methods to improve its business operations. <br>
 
 
</div>
 
  
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">MOTIVATION</font></div>
+
While it is important to fulfil the bottom line, a huge determinant of the company’s success stems from the public’s perception and Grab’s positioning in the markets. As such, this project aims to create a systematic method in which Grab can use to understand the public’s sentiment on their product and the company image. <br>
  
Paragraph 1
+
The NLP algorithm used by Grab to describe the public’s sentiments follows the Latent Dirichlet Allocation (LDA) model, a generative statistical model that allows sets of observations to be explained by unobserved groups, or topics, that explain how some parts of the data are similar. With this model, Grab hopes to identify latent topics of interest and understand the public’s perception of the topic. Grab can therefore make use of this information to address any inadequacies in their business practices, create more effective marketing campaigns and improve their business operations to build a stronger overall brand image.
 
 
Paragraph 2
 
 
 
Sub Paragraph 2.1
 
  
 
</div>
 
</div>
  
 
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">OBJECTIVES</font></div>
 
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">OBJECTIVES</font></div>
 +
<br>
 +
The objective of the visualization is to bridge the gap between the analytics and business teams. While the findings from the LDA model might be intuitive for those from the analytics team, the business users may find it difficult to internalize these findings. Hence, we aim to create a scalable way to present these findings to the business users in an easy to understand format.
  
Paragraph 1
+
As mentioned before, we hope that by employing the LDA model, the business teams will be able to get a sense of the current perception of Grab’s products by the community. In the model, the user chooses an input of the total number of topics that he or she believes is an ideal balance between granularity and actionability (i.e too many topics may result in low actionability, while too little topics may result in not enough cause for action).  As the ideal number of topics are often subjective, our proposed visualization will have the option for users to adjust the number of topics based on what he/she believes is ideal. We will also be including a covariance score plot that can guide users towards the ideal number of topics.
 
 
# Number 1
 
# Number 2
 
  
 
</div>
 
</div>
  
 
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">DATA SOURCE</font></div>
 
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">DATA SOURCE</font></div>
 
+
<br>
"Where our data came from"
+
<b> Data Source </b><br>
 
+
Data used is obtained from web scraping of various social media platforms such as Instagram, Twitter, Reddit and Google Play. <br> The data set consists of 9000 comments that were scraped and collected. <br><br>
<b> Data Source </b> <br>
 
Data used is obtained from web scraping of various social media platforms such as Instagram, Twiiter, Reddit and Google Playstore. <br> The data set consists of 9000 comments that were scraped and collected. <br><br>
 
  
 
<b> Data Attributes </b> <br>
 
<b> Data Attributes </b> <br>
 
The following is a snapshot of the data collected, and a description of the data attributes: <br>
 
The following is a snapshot of the data collected, and a description of the data attributes: <br>
[[File:Metadata.jpg|thumb|alt=Alt text| Figure 1: Comments Dataset|center|upright=2.35]] <br>
+
[[File:Metadata1.png|thumb|alt=Alt text| Figure 1: Comments Dataset|left|upright=2.35]] <br>
  
[[File:Intertopic.jpg|thumb|alt=Alt text| Figure 2: Inter-topic Distance Map split into Clusters based on Coherence Values|center|upright=2.35]] <br>
+
{| class="wikitable" style="background-color:#FFFFFF;" width="60%"
 
 
{| class="wikitable" style="background-color:#FFFFFF;" style="margin: auto;" width="60%"
 
 
! Data Attributes
 
! Data Attributes
 
! Description of attributes
 
! Description of attributes
 
|-
 
|-
 
| style="text-align: center;" |Document
 
| style="text-align: center;" |Document
| Comments scraped may consist of more than a sentence each. They are hence separated and identified by documents. Hence, a document represents a sentence of comment.  
+
| Comments scraped may consist of more than a sentence each. They are separated and identified by documents. Hence, a document represents a sentence of comment.  
 +
 
 
|-
 
|-
 
| style="text-align: center;" | Dominant_Topic
 
| style="text-align: center;" | Dominant_Topic
| Dominant topic refers to the cluster that the topic will most likely be sorted into.
+
| Dominant topic refers to the topic that the document will most likely be sorted into.
 
|-
 
|-
 
| style="text-align: center;" | Topic_Perc_Contrib
 
| style="text-align: center;" | Topic_Perc_Contrib
| The probability that the comment will be found in the cluster amongst all other comments with similar keywords.
+
| The probability that the comment will be found in the topic amongst all other comments with similar keywords.
 
|-
 
|-
 
| style="text-align: center;" | Keywords
 
| style="text-align: center;" | Keywords
| Keywords that can be found in the specific cluster.
+
| Keywords that belong in each of the topics.
 
|-
 
|-
 
| style="text-align: center;" | Text
 
| style="text-align: center;" | Text
Line 91: Line 81:
 
|-
 
|-
 
| style="text-align: center;" | Original_Comment
 
| style="text-align: center;" | Original_Comment
| Original comment sentence that was scraped.
+
| Original comment.
 +
 
 
|-
 
|-
| style="text-align: center;" | Date
+
| style="text-align: center;" | Comment_Date
 
| Date that the comment was posted.
 
| Date that the comment was posted.
 
|}
 
|}
Line 102: Line 93:
  
 
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">BACKGROUND SURVEY OF RELATED WORKS</font></div>
 
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">BACKGROUND SURVEY OF RELATED WORKS</font></div>
 +
<br>
 
{| class="wikitable" style="background-color:#FFFFFF;" width="80%"
 
{| class="wikitable" style="background-color:#FFFFFF;" width="80%"
 
! style="width:50%" | Related Works
 
! style="width:50%" | Related Works
 
! What We Can Learn
 
! What We Can Learn
 
|-
 
|-
| style="text-align: center;" | <b> Grab Traffic Trends </b>
+
| style="text-align: center;" | <b> Tweet Sentiment Visualisation  </b><br>
"Insert Image here"
+
[[File:Tweet sentiment visualisation.png|center|400px]]<br>
Source: "Insert source here"
+
Source: https://www.csc2.ncsu.edu/faculty/healey/tweet_viz/tweet_app/
 +
|
 +
* Using a sentiment dictionary to estimate the sentiment of each tweet
 +
* Scaling sentiments from negative to positive, where blue is unpleasant and green is pleasant
 +
* Shades of tweets are dependent on whether the tweet is an active tweet or sedate tweet
 +
|-
 +
| style="text-align: center;" | <b> Tweet topic cluster visualisation </b>
 +
[[File:Tweet topic cluster visualisation.png|400px|center]]
 +
Source: https://www.csc2.ncsu.edu/faculty/healey/tweet_viz/tweet_app/
 
|
 
|
* Learning 1
+
* Common words are grouped into topic clusters
* Learning 2
+
* Keywords in a cluster indicates the topic
* Learning 3
+
* At the same time, words that are not categorised in a topic are separated
 
|-
 
|-
| style="text-align: center;" | <b> Another Graph </b>
+
| style="text-align: center;" | <b> Tweet sentiment across time </b>
"Insert Image here"
+
[[File:Tweet sentiment across time.png|400px|center]]
Source: "Insert source here"
+
Source: https://www.csc2.ncsu.edu/faculty/healey/tweet_viz/tweet_app/
 
|
 
|
* Learning 1
+
* Shows sentiment of words over time using a time series line graph
* Learning 2
+
* Can be used to observe the change in topic interest over time
* Learning 3
+
* Idea can be adapted to comments in every cluster of comments
 
|}
 
|}
 
</div>
 
</div>
  
 
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">STORYBOARD</font></div>
 
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">STORYBOARD</font></div>
 +
<br>
 
{| class="wikitable" style="background-color:#FFFFFF;" width="80%"
 
{| class="wikitable" style="background-color:#FFFFFF;" width="80%"
 
! style="width:50%" | Sketches
 
! style="width:50%" | Sketches
 
! Description of Approach
 
! Description of Approach
 
|-
 
|-
| style="text-align: center;" | <b> Sketch 1 </b>
+
| style="text-align: center;" | <b> Visualisation 1: User modelling interface </b>
"Insert Image here"
+
[[File:Visualisation 1 - User modelling interface.png|400px|center]]
Source: "Insert source here"
+
|
 +
Idea: <br>
 +
Users are able to switch between the number of clusters they wish to output using coherence value as a guide. <br>
 +
 
 +
* Coherence score chart
 +
* Cluster count (for user input)
 +
* Relative size of clusters to each other
 +
* Top 5 words per cluster
 +
* Suggested cluster topic based on keyword that appears most frequently
 +
|-
 +
| style="text-align: center;" | <b> Visualisation 2: Topic Cluster Visualisation<br></b> Option 1
 +
<br>
 +
[[File:Topic Cluster Visualisation - option 1.png|400px|center]]
 +
Inspired by: https://www.csc2.ncsu.edu/faculty/healey/tweet_viz/
 +
|
 +
Idea: <br>
 +
Show how topics change in discussion frequency and sentiment over time. Also observe birth and death of topics. <br>
 +
 
 +
* Grab service filter
 +
* Sentiment legend*
 +
* Topic frequency legend
 +
* Date scroll bar
 +
* Selected Topic
 +
* Actual comments (comments highlighted based on selected topic)
 +
* Details of selected topic
 +
 
 +
<nowiki>*</nowiki>subject to data availability
 +
 
 +
|-
 +
| style="text-align: center;" | <b> Visualisation 2: Topic Cluster Visualisation<br></b> Option 2
 +
[[File:Topic Cluster Visualisation - option 2.png|400px|center]]
 +
[[File:Topic Cluster Visualisation - option 2.1.png|400px|center]]
 
|
 
|
* Approach 1
+
Idea: <br>
* Approach 2
+
Increased granularity compared to Option 1, to show distribution of individual comment sentiments within each topic cluster. <br>
* Approach 3
+
 
 +
* Same features as above
 +
 
 
|-
 
|-
| style="text-align: center;" | <b> Sketch 2 </b>
+
| style="text-align: center;" | <b> Visualisation 2: Topic Cluster Visualisation <br></b> Option 3
"Insert Image here"
+
[[File:Topic Cluster Visualisation - option 3.png|400px|center]]
Source: "Insert source here"
+
Inspired by: http://lcs.ios.ac.cn/~shil/paper/VISA_VINCI.pdf
 
|
 
|
* Approach 1
+
Idea: <br>
* Approach 2
+
Allow snapshot view of changes in topic frequency and sentiment across the whole time period. <br>
* Approach 3
+
 
 +
* Grab service filter
 +
* Date scroll bar
 +
* Sentiment legend*
 +
* Selected Topic
 +
* Actual comments (comments highlighted based on selected topic
 +
* Details of selected topic
 +
 
 +
<nowiki>*</nowiki>subject to data availability
 
|}
 
|}
 
</div>
 
</div>
  
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">PROPOSED VISUALISATION</font></div>
+
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">KEY TECHNICAL CHALLENGES</font></div>
???? Write what???? halps
+
<br>
 +
{| class="wikitable" style="background-color:#FFFFFF;" width="80%"
 +
! style="width:50%" | Key challenges
 +
! Proposed solution to overcome challenges
 +
|-
 +
| style="text-align: center;" | Data cleaning and ensuring good data quality
 +
|
 +
# Take time to understand and explore sponsored data
 +
# Understand procedures taken by sponsor to process data before sending it to us
 +
# Validate quality of data before proceeding
 +
|-
 +
| style="text-align: center;" | Lack of experience using relevant tools for analysis such as RShiny
 +
|
 +
# Attend the R workshop conducted by Prof. Kam
 +
# Independent learning through online tutorials such as DataCamp
 +
# Seek advice from seniors and ask on online forums
 +
|-
 +
| style="text-align: center;" | Determining the most effective way to visualise the data
 +
|
 +
# Brainstorm and design storyboard before implementing anything
 +
# Consult sponsor on requirements
 +
# Consult Prof. Kam on visualisation direction
 +
# Research on similar visualizations to determine what the best approach is
 +
|}
 +
 
 
</div>
 
</div>
  
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">KEY TECHNICAL CHALLENGES</font></div>
+
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">PROJECT TIMELINE</font></div>
* Challenge 1
+
<br>
* Challenge 2
+
[[File:Gantt Chart.png|1500px]]
* Challenge 3
 
 
</div>
 
</div>
  
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">PROJECT TIMELINE</font></div>
+
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">TECHNOLOGIES AND TOOLS</font></div>
"Insert Gantt Chart"
+
<br>
 +
{| class="wikitable" style="background-color:#FFFFFF;" width="80%"
 +
! style="width:50%" | Technology and Tools
 +
! Explanation
 +
|-
 +
| style="text-align: center;" | [[File:Rshiny.jpg|100px|center]]
 +
| We will primarily be using Shiny for our visualization. Shiny is an open source R package that provides an elegant and powerful web framework for building web applications using R
 +
|-
 +
| style="text-align: center;" | [[File:Rstudio.jpg|90px|center]]
 +
| We will be building the machine learning model and visualization using R Studio
 +
|-
 +
| style="text-align: center;" | [[File:Photoshop.png|80px|center]]
 +
| We will be using Photoshop to design our poster
 +
|}
 
</div>
 
</div>
  
 
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">REFERENCES</font></div>
 
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">REFERENCES</font></div>
"Insert Links and Description"
+
<br>
 +
https://www.csc2.ncsu.edu/faculty/healey/tweet_viz/ <br>
 +
https://rstudio-pubs-static.s3.amazonaws.com/236186_d311ea00291d42509864aa0a77d340e8.html <br>
 +
http://lcs.ios.ac.cn/~shil/paper/VISA_VINCI.pdf <br>
 +
These pages have inspired numerous alternatives to visualising topic sentiments
 
</div>
 
</div>
  
 
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">COMMENTS</font></div>
 
<div style="background: #009D3B ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#009D3B face="Century Gothic">COMMENTS</font></div>
Something
+
<br>
 +
Feel free to comment and leave suggestions and feedback to help us improve our project!:D
 
</div>
 
</div>

Latest revision as of 20:01, 14 October 2018

Grab-logo.png


Back to Project Home

HOME

 

PROPOSAL

 

POSTER

 

APPLICATION

 

REPORT

 

Version 1 | Version 2

PROBLEM AND MOTIVATION


Despite Grab’s strong presence within the South East Asian rideshare market following the acquisition of Uber in March of this year, the growing number of players within the different fields in which Grab operates in incentivises the company to adopt non-traditional methods to improve its business operations.

While it is important to fulfil the bottom line, a huge determinant of the company’s success stems from the public’s perception and Grab’s positioning in the markets. As such, this project aims to create a systematic method in which Grab can use to understand the public’s sentiment on their product and the company image.

The NLP algorithm used by Grab to describe the public’s sentiments follows the Latent Dirichlet Allocation (LDA) model, a generative statistical model that allows sets of observations to be explained by unobserved groups, or topics, that explain how some parts of the data are similar. With this model, Grab hopes to identify latent topics of interest and understand the public’s perception of the topic. Grab can therefore make use of this information to address any inadequacies in their business practices, create more effective marketing campaigns and improve their business operations to build a stronger overall brand image.

OBJECTIVES


The objective of the visualization is to bridge the gap between the analytics and business teams. While the findings from the LDA model might be intuitive for those from the analytics team, the business users may find it difficult to internalize these findings. Hence, we aim to create a scalable way to present these findings to the business users in an easy to understand format.

As mentioned before, we hope that by employing the LDA model, the business teams will be able to get a sense of the current perception of Grab’s products by the community. In the model, the user chooses an input of the total number of topics that he or she believes is an ideal balance between granularity and actionability (i.e too many topics may result in low actionability, while too little topics may result in not enough cause for action). As the ideal number of topics are often subjective, our proposed visualization will have the option for users to adjust the number of topics based on what he/she believes is ideal. We will also be including a covariance score plot that can guide users towards the ideal number of topics.

DATA SOURCE


Data Source
Data used is obtained from web scraping of various social media platforms such as Instagram, Twitter, Reddit and Google Play.
The data set consists of 9000 comments that were scraped and collected.

Data Attributes
The following is a snapshot of the data collected, and a description of the data attributes:

Alt text
Figure 1: Comments Dataset


Data Attributes Description of attributes
Document Comments scraped may consist of more than a sentence each. They are separated and identified by documents. Hence, a document represents a sentence of comment.
Dominant_Topic Dominant topic refers to the topic that the document will most likely be sorted into.
Topic_Perc_Contrib The probability that the comment will be found in the topic amongst all other comments with similar keywords.
Keywords Keywords that belong in each of the topics.
Text Words in each comment after the removal of stop words (eg. the, is, to, on etc).
Original_Comment Original comment.
Comment_Date Date that the comment was posted.

BACKGROUND SURVEY OF RELATED WORKS


Related Works What We Can Learn
Tweet Sentiment Visualisation
Tweet sentiment visualisation.png

Source: https://www.csc2.ncsu.edu/faculty/healey/tweet_viz/tweet_app/

  • Using a sentiment dictionary to estimate the sentiment of each tweet
  • Scaling sentiments from negative to positive, where blue is unpleasant and green is pleasant
  • Shades of tweets are dependent on whether the tweet is an active tweet or sedate tweet
Tweet topic cluster visualisation
Tweet topic cluster visualisation.png

Source: https://www.csc2.ncsu.edu/faculty/healey/tweet_viz/tweet_app/

  • Common words are grouped into topic clusters
  • Keywords in a cluster indicates the topic
  • At the same time, words that are not categorised in a topic are separated
Tweet sentiment across time
Tweet sentiment across time.png

Source: https://www.csc2.ncsu.edu/faculty/healey/tweet_viz/tweet_app/

  • Shows sentiment of words over time using a time series line graph
  • Can be used to observe the change in topic interest over time
  • Idea can be adapted to comments in every cluster of comments
STORYBOARD


Sketches Description of Approach
Visualisation 1: User modelling interface
Visualisation 1 - User modelling interface.png

Idea:
Users are able to switch between the number of clusters they wish to output using coherence value as a guide.

  • Coherence score chart
  • Cluster count (for user input)
  • Relative size of clusters to each other
  • Top 5 words per cluster
  • Suggested cluster topic based on keyword that appears most frequently
Visualisation 2: Topic Cluster Visualisation
Option 1


Topic Cluster Visualisation - option 1.png

Inspired by: https://www.csc2.ncsu.edu/faculty/healey/tweet_viz/

Idea:
Show how topics change in discussion frequency and sentiment over time. Also observe birth and death of topics.

  • Grab service filter
  • Sentiment legend*
  • Topic frequency legend
  • Date scroll bar
  • Selected Topic
  • Actual comments (comments highlighted based on selected topic)
  • Details of selected topic

*subject to data availability

Visualisation 2: Topic Cluster Visualisation
Option 2
Topic Cluster Visualisation - option 2.png
Topic Cluster Visualisation - option 2.1.png

Idea:
Increased granularity compared to Option 1, to show distribution of individual comment sentiments within each topic cluster.

  • Same features as above
Visualisation 2: Topic Cluster Visualisation
Option 3
Topic Cluster Visualisation - option 3.png

Inspired by: http://lcs.ios.ac.cn/~shil/paper/VISA_VINCI.pdf

Idea:
Allow snapshot view of changes in topic frequency and sentiment across the whole time period.

  • Grab service filter
  • Date scroll bar
  • Sentiment legend*
  • Selected Topic
  • Actual comments (comments highlighted based on selected topic
  • Details of selected topic

*subject to data availability

KEY TECHNICAL CHALLENGES


Key challenges Proposed solution to overcome challenges
Data cleaning and ensuring good data quality
  1. Take time to understand and explore sponsored data
  2. Understand procedures taken by sponsor to process data before sending it to us
  3. Validate quality of data before proceeding
Lack of experience using relevant tools for analysis such as RShiny
  1. Attend the R workshop conducted by Prof. Kam
  2. Independent learning through online tutorials such as DataCamp
  3. Seek advice from seniors and ask on online forums
Determining the most effective way to visualise the data
  1. Brainstorm and design storyboard before implementing anything
  2. Consult sponsor on requirements
  3. Consult Prof. Kam on visualisation direction
  4. Research on similar visualizations to determine what the best approach is
PROJECT TIMELINE


Gantt Chart.png

TECHNOLOGIES AND TOOLS


Technology and Tools Explanation
Rshiny.jpg
We will primarily be using Shiny for our visualization. Shiny is an open source R package that provides an elegant and powerful web framework for building web applications using R
Rstudio.jpg
We will be building the machine learning model and visualization using R Studio
Photoshop.png
We will be using Photoshop to design our poster
REFERENCES


https://www.csc2.ncsu.edu/faculty/healey/tweet_viz/
https://rstudio-pubs-static.s3.amazonaws.com/236186_d311ea00291d42509864aa0a77d340e8.html
http://lcs.ios.ac.cn/~shil/paper/VISA_VINCI.pdf
These pages have inspired numerous alternatives to visualising topic sentiments

COMMENTS


Feel free to comment and leave suggestions and feedback to help us improve our project!:D