Difference between revisions of "ISSS608 2016 17T1 Group4 Report"
| (39 intermediate revisions by 2 users not shown) | |||
| Line 24: | Line 24: | ||
| = Review and critic on past works = | = Review and critic on past works = | ||
| − | There are a number of free Twitter analytics and visualization tools available online. In particular interest of analyzing social networks, they provide options such as analyzing one's Twitter network, visualizing the followers on a map, statistics on user mentions, communication between users. However, availability of a interactive custom application to understand the association between #tags and @mentions is uncommon. It is more important to see the connections between #tags, which represent all elements within the subject of interest. | + | There are a number of free Twitter analytics and visualization tools available online. In particular interest of analyzing social networks, they provide options such as analyzing one's Twitter network, visualizing the followers on a map, statistics on user mentions, communication between users. However, availability of a interactive custom application in R, to understand the association between #tags and @mentions is uncommon. It is more important to see the connections between #tags, which represent all elements within the subject of interest. <br/> | 
| − | When there is a key political event for instance, the influences on the social networks can cause high impact to change directions. It will be of great value to identify prominent users or influencers towards whom key issues are directed  | + | When there is a key political event for instance, the influences on the social networks can cause high impact to change directions. It will be of great value to identify prominent users or influencers towards whom key issues are directed. An interactive visual application with elements such as network graphs, can best help see this. <br/> | 
| = Data = | = Data = | ||
| Line 40: | Line 40: | ||
| === Word Cloud === | === Word Cloud === | ||
| − | Word Cloud is famously described as “the mullets of the Internet“. However, it can be useful when trying to reveal repeated themes or words. Furthermore, it is engaging and the results can be understood quickly. For our application, we have incorporated filters so that users are able to explore the Word Cloud as it is being generated thus, allowing them to decide the visualization that best suit their requirements. | + | Word Cloud is famously described as “the mullets of the Internet“. However, it can be useful when trying to reveal repeated themes or words. Furthermore, it is engaging and the results can be understood quickly.   | 
| + | |||
| + | For our application, we have incorporated filters so that users are able to explore the Word Cloud as it is being generated thus, allowing them to decide the visualization that best suit their requirements. The size of the word reflects the occurrence frequency of the words. | ||
| * Minimum Frequency: words with frequency less than the selected number is not displayed. | * Minimum Frequency: words with frequency less than the selected number is not displayed. | ||
| Line 46: | Line 48: | ||
| * Rotation: This is for users to customise the layout of the Word Cloud. From a readability perspective, it is preferred to have all words in horizontal format (i.e. rotation = 0). | * Rotation: This is for users to customise the layout of the Word Cloud. From a readability perspective, it is preferred to have all words in horizontal format (i.e. rotation = 0). | ||
| − | + | We used tm package to clean the tweet content, SnowballC package to stem the words to root form and wordcloud package to create the visualisation. | |
| + | |||
| === Network Graph === | === Network Graph === | ||
| Network Graph Visualizations are used to represent real world networks such as air or land transport networks, layout of optic fiber connections, social networks and many more. | Network Graph Visualizations are used to represent real world networks such as air or land transport networks, layout of optic fiber connections, social networks and many more. | ||
| Line 56: | Line 59: | ||
| *The algorithm for the graph layout. | *The algorithm for the graph layout. | ||
| − | = Demonstration = | + | We used forceNetwork  graphs from [https://christophergandrud.github.io/networkD3/ NetworkD3] package to create interactive network graph visualizations. NetworkD3 package makes use of htmlWidgets framework to display javascript objects in R, and this is supported in RStudio. ForceNetwork is used to have more control over the appearance of the forced directed network and to plot complicated networks. | 
| + | |||
| + | = Demonstration & User Guide = | ||
| − | + | Our application is designed to analyse and explore the association of #tag with @user-mention.  | |
| + | To showcase the functionalities of the application, we selected US Election as a theme for demonstration. | ||
| == Word Cloud == | == Word Cloud == | ||
| Line 74: | Line 80: | ||
| 4. Download the word cloud image and its frequency table by clicking on the 'Download Image" and "Download Frequency Table" buttons respectively. The downloaded image is in PNG format and table is in CSV format. | 4. Download the word cloud image and its frequency table by clicking on the 'Download Image" and "Download Frequency Table" buttons respectively. The downloaded image is in PNG format and table is in CSV format. | ||
| + | |||
| + | Based on the Word Cloud generated, #uselection appears more frequently in the text file than #hillaryemails, suggesting more people are interested in the US elections than Hillary Clinton's emails. | ||
| [[File:Screenshot - word cloud illustrated.jpg]] | [[File:Screenshot - word cloud illustrated.jpg]] | ||
| − | + | <br/> | |
| + | |||
| + | == Network Graphs == | ||
| + | <br/> | ||
| + | Input file required to generate network graphs should be placed in the same folder along with R application file (app.R file). The input file should be named "Twitter_Hash_UserMention.csv". This input file is expected to have 3 columns, viz. "FROM", "TO" and "Weight". "FROM" column is expected to contain #tag  and "TO" column is expected to contain @user-mentions. "Weight" column should contain the numeric value representing the frequency of the occurrence of the #tag & @user-mention for the respective row. The frequency values get aggregated in the R code for every unique combination of  #tag & @user-mention. | ||
| + | [[File:NetworkGraph inputFile.PNG|200px|center]] | ||
| + | |||
| + | <br/> | ||
| + | The association of #Tags and @User-mentions in tweets can be visualized using three perspectives in this application. <br/> | ||
| + | First perspective visualizes the entire network and renders the opportunity to explore the networked association as a whole.  #Tags and @Unser-mentions are differentiated by the colour of the nodes. The network graph can be zoomed in/out to identify & explore a specific network pattern in more detail. As an example, we see #climate being associated with multiple @user-mentions which are in turn associated with some more #Tags.   | ||
| + | Adjusting the node repletion and Link distance features will further help in revealing the patterns of interest.<br/> | ||
| − | + | [[File:Network1-climate 1.png|1000px]]   | |
| + | [[File:Network1-climate.png|500px|right]] | ||
| + | <br clear=all> | ||
| + | Second perspective provides the option to filter out specific #Tags and narrow down the network pattern to explore the association in more depth. Width of the link represents the strength of the association between #tag and @user-mention. Nodes connected with the wider link indicates that the nodes (#tag & @user-mention) are being associated more frequently together than other combination of nodes. Including multiple #tags in filter can help revel the common @user-mentions who are being associated with the selected #tags. Adjusting the colour opacity controls the opacity of node labels. User can try out different colour palettes for nodes. <br/><br/> | ||
| + | [[File:Network2-climate 1.png|1000px]]   | ||
| + | <br clear=all><br/><br/> | ||
| + | Third perspective provides the option to filter out specific @User-mentions and narrow down the network pattern to explore the association in more depth. Adjusting the node repletion and Link distance will help to structure of the network representation and will further in exploration of association. Width of the link represents the strength of the association between #tag and @user-mention. Nodes connected with the wider link indicates that the nodes (#tag & @user-mention) are being associated more frequently together than other combination of nodes. Including multiple #tags in filter can help revel the common @user-mentions who are being associated with the selected #tags. Adjusting the colour opacity controls the opacity of node labels. User can try out different colour palettes for nodes.<br/><br/> | ||
| + | [[File:Network3-hillary.png|1000px]]   | ||
| + | <br clear=all><br/> | ||
| = Discussion = | = Discussion = | ||
| − | + | Over the course of discussions during Poster presentation, we had some insightful discussions with audience. They were curious to understand how they can use the application and insights they can draw.  | |
| − | + | <br/> | |
| + | During one of the discussions, we explained how the network graph visualization can be used to analyse the association of your company (@CompanyName) with different #tags.  By understanding how the customers (existing/potential) are associating the company with an idea or an event (#tag) we can derive insights on our company is being perceived by the customers. More association with #tag representing negative idea or sentiment will flag out the areas of concern which need attention. Similarly, by analysing associations of competitor (@CompetitorName) with certain events or ideas (#tags) will show the where competitors are doing well from customer’s perspective.  These insights can be used to formulate marketing strategies. | ||
| = Future Work = | = Future Work = | ||
| − | * Font colours of Word Cloud could be of one single colour to  | + | * Font colours of Word Cloud could be of one single colour to minimize distractions | 
| + | * Improve on Word Cloud's Minimum Frequency filter to allow user to enter the frequency range, rather than using a slider that has hard-coded limits (currently at 500)  | ||
| * Intensity of the font colour in Word Cloud could complement the font size, which represents the frequency of the word | * Intensity of the font colour in Word Cloud could complement the font size, which represents the frequency of the word | ||
| + | * Enabling R code to perform entire data cleaning and prepare the final input data format for network graph | ||
| * Enable display of Network Graph once a user clicks on a #tag or @mention in the Word Cloud | * Enable display of Network Graph once a user clicks on a #tag or @mention in the Word Cloud | ||
| − | |||
| * Displaying the tweet messages when a node is selected | * Displaying the tweet messages when a node is selected | ||
| + | * Application could be enhanced to include demographic details when visualizing the network graph | ||
| + | * Application can be integrated with twitter API to collect feeds and visually explore the user connection networks | ||
| + | * Sentiment analysis of tweets can be included to further enhance the exploration capability of this application | ||
| + | |||
| <br> | <br> | ||
| = References = | = References = | ||
| − | [http://twittertoolsbook.com/10-awesome-twitter-analytics-visualization-tools/ Twitter Viz tools] | + | [http://twittertoolsbook.com/10-awesome-twitter-analytics-visualization-tools/ Twitter Viz tools] <br/> | 
| − | + | [http://stackoverflow.com/questions/27951377/tm-custom-removepunctuation-except-hashtag Stack Overflow - tm custom removePunctuation except hashtag] <br/> | |
| − | [http://stackoverflow.com/questions/27951377/tm-custom-removepunctuation-except-hashtag Stack Overflow - tm custom removePunctuation except hashtag] | + | [https://gist.github.com/TrigonaMinima/bd0fb5a568b8227487ee TrigonaMinima - Word Cloud] <br/> | 
| − | + | [https://www.visioncritical.com/pros-and-cons-word-clouds-visualizations/ The pros and cons of word clouds as visualizations] <br/> | |
| − | [https://gist.github.com/TrigonaMinima/bd0fb5a568b8227487ee TrigonaMinima - Word Cloud] | + | https://christophergandrud.github.io/networkD3/#force <br/> | 
| − | + | http://slides.com/tskam/isss608-lesson08/fullscreen#/5/2 <br/> | |
| − | [https://www.visioncritical.com/pros-and-cons-word-clouds-visualizations/ The pros and cons of word clouds as visualizations] | + | http://shiny.rstudio.com/articles/layout-guide.html <br/> | 
| + | http://shiny.rstudio.com/articles/dynamic-ui.html <br/> | ||
| + | http://shiny.rstudio.com/reference/shiny/latest/selectInput.html <br/> | ||
| + | http://shiny.rstudio.com/reference/shiny/latest/updateSelectInput.html <br/> | ||
| + | http://shiny.rstudio.com/articles/layout-guide.html <br/> | ||
| + | https://cran.r-project.org/web/packages/RColorBrewer/RColorBrewer.pdf <br/> | ||
| + | https://cran.r-project.org/web/packages/networkD3/networkD3.pdf <br/> | ||
| + | https://cran.r-project.org/web/packages/RColorBrewer/RColorBrewer.pdf <br/> | ||
| + | https://github.com/christophergandrud/d3ShinyExample/blob/master/ui.R <br/> | ||
| + | https://github.com/christophergandrud/networkD3/blob/master/man/networkD3-shiny.Rd <br/> | ||
| + | http://stackoverflow.com/questions/23107094/setting-working-directory-through-a-function <br/> | ||
| + | http://stackoverflow.com/questions/24240434/r-shiny-error-object-of-type-closure-is-not-subsettable <br/> | ||
| + | http://stackoverflow.com/questions/31366066/how-to-plot-a-bipartite-graph-in-r <br/> | ||
| + | http://stackoverflow.com/questions/21465411/r-shiny-passing-reactive-to-selectinput-choices <br/> | ||
| + | http://curleylab.psych.columbia.edu/netviz/netviz2.html#/12 <br/> | ||
| + | http://www.statmethods.net/input/datatypes.html <br/> | ||
| + | https://www.datacamp.com/community/tutorials/15-easy-solutions-data-frame-problems-r#gs.hA52b9g <br/> | ||
| + | http://www.homogenisation.org/admin/docs/Lists&DataFrames.pdf <br/> | ||
| + | https://lembra.wordpress.com/2010/03/12/adding-new-column-to-a-data-frame-in-r/ <br/> | ||
| + | http://www.cookbook-r.com/Manipulating_data/Adding_and_removing_columns_from_a_data_frame/ <br/> | ||
| + | https://stat.ethz.ch/R-manual/R-devel/library/base/html/Extract.data.frame.html <br/> | ||
| + | https://www.r-bloggers.com/basic-text-string-functions-in-r/ <br/> | ||
| + | https://www.r-bloggers.com/select-operations-on-r-data-frames/ <br/> | ||
| + | https://www.r-bloggers.com/select-operations-on-r-data-frames/ <br/> | ||
| + | http://search.r-project.org/library/networkD3/html/networkD3-shiny.html <br/> | ||
| + | http://curleylab.psych.columbia.edu/netviz/netviz2.html#/12 <br/> | ||
| + | https://stat.ethz.ch/R-manual/R-devel/library/base/html/Extract.data.frame.html <br/> | ||
| + | http://www.htmlwidgets.org/showcase_networkD3.html  <br/> | ||
| + | http://blog.revolutionanalytics.com/2015/07/creating-network-graphs-using-javascript-directly-from-r.html <br/> | ||
Latest revision as of 20:20, 29 November 2016
| PROPOSAL | POSTER | APPLICATION | REPORT | 
Contents
Motivation of the application
There is huge amount of data in the social networks in the present times. More communication and expression of views are happening within the network than ever before. These networks define an important societal element. Twitter is one such prominent directed network, where information can be posted as tweets; short but influential messages. When key events occur, this is probably the first place one gets to see the buzz, the trends, status and the direction. For stakeholders, this information is key.
Tweets are voluminous and text rich. Mining this information, defining the business use case and the insights required is a challenge. The US 2016 Elections took the attention of the world during this time. This project analyses tweets for top US elections #tags. On the day of 2016 US presidential election, Twitter proved to be the largest source of breaking news with 40 million tweets.
Objective
Using Visual Analytics techniques and applications, we want to answer these question.
- What were the key ideas in the tweets, namely #tags that were most mentioned?
- To which users, @mentions in the twitter network, did these #tags relate to? In the given tweets, compare and analyse the association between the #tags and the @mentions.
Review and critic on past works
There are a number of free Twitter analytics and visualization tools available online. In particular interest of analyzing social networks, they provide options such as analyzing one's Twitter network, visualizing the followers on a map, statistics on user mentions, communication between users. However, availability of a interactive custom application in R, to understand the association between #tags and @mentions is uncommon. It is more important to see the connections between #tags, which represent all elements within the subject of interest. 
When there is a key political event for instance, the influences on the social networks can cause high impact to change directions. It will be of great value to identify prominent users or influencers towards whom key issues are directed. An interactive visual application with elements such as network graphs, can best help see this. 
Data
Downloaded 19K tweets from www.followthehashtag.com for popular US elections 2016 hashtags. Apart from the actual tweet content, the data had other attributes such as, associated #tag, @mentions, frequency of retweets, media mentions and location.
With some study, the key attributes for Visual analysis were identified.
- #tags
- @mentions
- tweet content
Design framework
Word Cloud
Word Cloud is famously described as “the mullets of the Internet“. However, it can be useful when trying to reveal repeated themes or words. Furthermore, it is engaging and the results can be understood quickly.
For our application, we have incorporated filters so that users are able to explore the Word Cloud as it is being generated thus, allowing them to decide the visualization that best suit their requirements. The size of the word reflects the occurrence frequency of the words.
- Minimum Frequency: words with frequency less than the selected number is not displayed.
- Maximum Words: maximum number of words to be contained within the Word Cloud. While we have set the upper limit to be 1,000, it is not recommended to have more than 100 words as the Word Cloud may get too cluttered.
- Rotation: This is for users to customise the layout of the Word Cloud. From a readability perspective, it is preferred to have all words in horizontal format (i.e. rotation = 0).
We used tm package to clean the tweet content, SnowballC package to stem the words to root form and wordcloud package to create the visualisation.
Network Graph
Network Graph Visualizations are used to represent real world networks such as air or land transport networks, layout of optic fiber connections, social networks and many more. Network visualization is a very useful technique to represent large scale Social Networks. A good network visualization should enable easy interpretation and understanding of the connectivity and relationships between the network elements. The key points to address when designing network graphs are-
- What are the nodes to be represented in the graph and what do they signify.
- Identify the properties to be assigned to the nodes by varying color or node size, for instance degree or betweenness.
- What should a Edge connecting any two nodes in the graph represent.
- The algorithm for the graph layout.
We used forceNetwork graphs from NetworkD3 package to create interactive network graph visualizations. NetworkD3 package makes use of htmlWidgets framework to display javascript objects in R, and this is supported in RStudio. ForceNetwork is used to have more control over the appearance of the forced directed network and to plot complicated networks.
Demonstration & User Guide
Our application is designed to analyse and explore the association of #tag with @user-mention. To showcase the functionalities of the application, we selected US Election as a theme for demonstration.
Word Cloud
How to use the application:
1. Upload text file using the Text File upload widget at the sidebar. Note: The maximum file size supported is 5MB and must be a text file.
2. Customise image to be displayed using filters e.g. maximum number of words, minimum frequency and rotation. In this case, we set Minimum Frequency: 50, Maximum Words: 100, Rotation: 0.35
3. The default setting is to have "Document Stemming" and "Repeatable" checked. To remove, simply uncheck the box.
- "Document stemming" is to reduce the words to their root form. E.g. 'elections' become 'election'. Frequencies of 'elections' and 'election' are added together for the word cloud generation.
- "Repeatable" allows changes made to the word cloud via the filters to be added/removed from the initial plot generated by R.
4. Download the word cloud image and its frequency table by clicking on the 'Download Image" and "Download Frequency Table" buttons respectively. The downloaded image is in PNG format and table is in CSV format.
Based on the Word Cloud generated, #uselection appears more frequently in the text file than #hillaryemails, suggesting more people are interested in the US elections than Hillary Clinton's emails.
Network Graphs
Input file required to generate network graphs should be placed in the same folder along with R application file (app.R file). The input file should be named "Twitter_Hash_UserMention.csv". This input file is expected to have 3 columns, viz. "FROM", "TO" and "Weight". "FROM" column is expected to contain #tag  and "TO" column is expected to contain @user-mentions. "Weight" column should contain the numeric value representing the frequency of the occurrence of the #tag & @user-mention for the respective row. The frequency values get aggregated in the R code for every unique combination of  #tag & @user-mention.
The association of #Tags and @User-mentions in tweets can be visualized using three perspectives in this application. 
First perspective visualizes the entire network and renders the opportunity to explore the networked association as a whole.  #Tags and @Unser-mentions are differentiated by the colour of the nodes. The network graph can be zoomed in/out to identify & explore a specific network pattern in more detail. As an example, we see #climate being associated with multiple @user-mentions which are in turn associated with some more #Tags.  
Adjusting the node repletion and Link distance features will further help in revealing the patterns of interest.
Second perspective provides the option to filter out specific #Tags and narrow down the network pattern to explore the association in more depth. Width of the link represents the strength of the association between #tag and @user-mention. Nodes connected with the wider link indicates that the nodes (#tag & @user-mention) are being associated more frequently together than other combination of nodes. Including multiple #tags in filter can help revel the common @user-mentions who are being associated with the selected #tags. Adjusting the colour opacity controls the opacity of node labels. User can try out different colour palettes for nodes. 
 
  
Third perspective provides the option to filter out specific @User-mentions and narrow down the network pattern to explore the association in more depth. Adjusting the node repletion and Link distance will help to structure of the network representation and will further in exploration of association. Width of the link represents the strength of the association between #tag and @user-mention. Nodes connected with the wider link indicates that the nodes (#tag & @user-mention) are being associated more frequently together than other combination of nodes. Including multiple #tags in filter can help revel the common @user-mentions who are being associated with the selected #tags. Adjusting the colour opacity controls the opacity of node labels. User can try out different colour palettes for nodes.
 
  
Discussion
Over the course of discussions during Poster presentation, we had some insightful discussions with audience. They were curious to understand how they can use the application and insights they can draw. 
During one of the discussions, we explained how the network graph visualization can be used to analyse the association of your company (@CompanyName) with different #tags.  By understanding how the customers (existing/potential) are associating the company with an idea or an event (#tag) we can derive insights on our company is being perceived by the customers. More association with #tag representing negative idea or sentiment will flag out the areas of concern which need attention. Similarly, by analysing associations of competitor (@CompetitorName) with certain events or ideas (#tags) will show the where competitors are doing well from customer’s perspective.  These insights can be used to formulate marketing strategies.
Future Work
- Font colours of Word Cloud could be of one single colour to minimize distractions
- Improve on Word Cloud's Minimum Frequency filter to allow user to enter the frequency range, rather than using a slider that has hard-coded limits (currently at 500)
- Intensity of the font colour in Word Cloud could complement the font size, which represents the frequency of the word
- Enabling R code to perform entire data cleaning and prepare the final input data format for network graph
- Enable display of Network Graph once a user clicks on a #tag or @mention in the Word Cloud
- Displaying the tweet messages when a node is selected
- Application could be enhanced to include demographic details when visualizing the network graph
- Application can be integrated with twitter API to collect feeds and visually explore the user connection networks
- Sentiment analysis of tweets can be included to further enhance the exploration capability of this application
References
Twitter Viz tools 
Stack Overflow - tm custom removePunctuation except hashtag 
TrigonaMinima - Word Cloud 
The pros and cons of word clouds as visualizations 
https://christophergandrud.github.io/networkD3/#force 
http://slides.com/tskam/isss608-lesson08/fullscreen#/5/2 
http://shiny.rstudio.com/articles/layout-guide.html 
http://shiny.rstudio.com/articles/dynamic-ui.html 
http://shiny.rstudio.com/reference/shiny/latest/selectInput.html 
http://shiny.rstudio.com/reference/shiny/latest/updateSelectInput.html 
http://shiny.rstudio.com/articles/layout-guide.html 
https://cran.r-project.org/web/packages/RColorBrewer/RColorBrewer.pdf 
https://cran.r-project.org/web/packages/networkD3/networkD3.pdf 
https://cran.r-project.org/web/packages/RColorBrewer/RColorBrewer.pdf 
https://github.com/christophergandrud/d3ShinyExample/blob/master/ui.R 
https://github.com/christophergandrud/networkD3/blob/master/man/networkD3-shiny.Rd 
http://stackoverflow.com/questions/23107094/setting-working-directory-through-a-function 
http://stackoverflow.com/questions/24240434/r-shiny-error-object-of-type-closure-is-not-subsettable 
http://stackoverflow.com/questions/31366066/how-to-plot-a-bipartite-graph-in-r 
http://stackoverflow.com/questions/21465411/r-shiny-passing-reactive-to-selectinput-choices 
http://curleylab.psych.columbia.edu/netviz/netviz2.html#/12 
http://www.statmethods.net/input/datatypes.html 
https://www.datacamp.com/community/tutorials/15-easy-solutions-data-frame-problems-r#gs.hA52b9g 
http://www.homogenisation.org/admin/docs/Lists&DataFrames.pdf 
https://lembra.wordpress.com/2010/03/12/adding-new-column-to-a-data-frame-in-r/ 
http://www.cookbook-r.com/Manipulating_data/Adding_and_removing_columns_from_a_data_frame/ 
https://stat.ethz.ch/R-manual/R-devel/library/base/html/Extract.data.frame.html 
https://www.r-bloggers.com/basic-text-string-functions-in-r/ 
https://www.r-bloggers.com/select-operations-on-r-data-frames/ 
https://www.r-bloggers.com/select-operations-on-r-data-frames/ 
http://search.r-project.org/library/networkD3/html/networkD3-shiny.html 
http://curleylab.psych.columbia.edu/netviz/netviz2.html#/12 
https://stat.ethz.ch/R-manual/R-devel/library/base/html/Extract.data.frame.html 
http://www.htmlwidgets.org/showcase_networkD3.html  
http://blog.revolutionanalytics.com/2015/07/creating-network-graphs-using-javascript-directly-from-r.html 




