ISSS608 2016 17T1 Group4 Report
Contents
Motivation of the application
There is huge amount of data in the social networks in the present times. More communication and expression of views are happening within the network than ever before. These networks define an important societal element. Twitter is one such prominent directed network, where information can be posted as tweets; short but influential messages. When key events occur, this is probably the first place one gets to see the buzz, the trends, status and the direction. For stakeholders, this information is key.
Tweets are voluminous and text rich. Mining this information, defining the business use case and the insights required is a challenge. The US 2016 Elections took the attention of the world during this time. This project analyses tweets for top US elections #tags. On the day of 2016 US presidential election, Twitter proved to be the largest source of breaking news with 40 million tweets.
Objective
Using Visual Analytics techniques and applications, we want to answer these question.
- What were the key ideas in the tweets, namely #tags that were most mentioned?
- To which users, @mentions in the twitter network, did these #tags relate to? In the given tweets, compare and analyse the association between the #tags and the @mentions.
Review and critic on past works
There are a number of free Twitter analytics and visualization tools available online. In particular interest of analyzing social networks, they provide options such as analyzing one's Twitter network, visualizing the followers on a map, statistics on user mentions, communication between users. However, availability of a interactive custom application to understand the association between #tags and @mentions is uncommon. It is more important to see the connections between #tags, which represent all elements within the subject of interest. When there is a key political event for instance, the influences on the social networks can cause high impact to change directions. It will be of great value to identify prominent users or influencers towards whom key issues are directed towards. An interactive visual application with elements such as network graphs, can best help see this.
Data
Downloaded 19K tweets from www.followthehashtag.com for popular US elections 2016 hashtags. Apart from the actual tweet content, the data had other attributes such as, associated #tag, @mentions, frequency of retweets, media mentions and location.
With some study, the key attributes for Visual analysis were identified.
- #tags
- @mentions
- tweet content
Design framework
Word Cloud
Word Cloud is famously described as “the mullets of the Internet“. However, it can be useful when trying to reveal repeated themes or words. Furthermore, it is engaging and the results can be understood quickly. For our application, we have incorporated filters so that users are able to explore the Word Cloud as it is being generated thus, allowing them to decide the visualization that best suit their requirements.
- Minimum Frequency: words with frequency less than the selected number is not displayed.
- Maximum Words: maximum number of words to be contained within the Word Cloud. While we have set the upper limit to be 1,000, it is not recommended to have more than 100 words as the Word Cloud may get too cluttered.
- Rotation: This is for users to customise the layout of the Word Cloud. From a readability perspective, it is preferred to have all words in horizontal format (i.e. rotation = 0).
Network Graph
Network Graph Visualizations are used to represent real world networks such as air or land transport networks, layout of optic fiber connections, social networks and many more. Network visualization is a very useful technique to represent large scale Social Networks. A good network visualization should enable easy interpretation and understanding of the connectivity and relationships between the network elements. The key points to address when designing network graphs are-
- What are the nodes to be represented in the graph and what do they signify.
- Identify the properties to be assigned to the nodes by varying color or node size, for instance degree or betweenness.
- What should a Edge connecting any two nodes in the graph represent.
- The algorithm for the graph layout.
Demonstration
We use the US Election tweets data as an example for our demonstration.
Word Cloud
How to use the application:
1. Upload text file using the Text File upload widget at the sidebar. Note: The maximum file size supported is 5MB and must be a text file.
2. Customise image to be displayed using filters e.g. maximum number of words, minimum frequency and rotation. In this case, we set Minimum Frequency: 50, Maximum Words: 100, Rotation: 0.35
3. The default setting is to have "Document Stemming" and "Repeatable" checked. To remove, simply uncheck the box.
- "Document stemming" is to reduce the words to their root form. E.g. 'elections' become 'election'. Frequencies of 'elections' and 'election' are added together for the word cloud generation.
- "Repeatable" allows changes made to the word cloud via the filters to be added/removed from the initial plot generated by R.
4. Download the word cloud image and its frequency table by clicking on the 'Download Image" and "Download Frequency Table" buttons respectively. The downloaded image is in PNG format and table is in CSV format.
Insights:
The size of the word reflects its frequency in the file. For example, the output #USElection has higher occurrence than #globalwarming.
Discussion
What has the audience learned from your work? What new insights or practices has your system enabled? A full blown user study is not expected, but informal observations of use that help evaluate your system are encouraged.
Future Work
- Font colours of Word Cloud could be of one single colour to minimise distractions.
- Intensity of the font colour in Word Cloud could complement the font size, which represents the frequency of the word
- Enable display of Network Graph once a user clicks on a #tag or @mention in the Word Cloud
- Improve on Word Cloud's Minimum Frequency filter to allow user to enter the frequency range, rather than using a slider that has hard-coded limits (currently at 500)
- Displaying the tweet messages when a node is selected


