Social Media & Public Opinion - Final

From Analytics Practicum
Jump to navigation Jump to search

Home   HOME

 

Team   TEAM

 

Project Overview   PROJECT OVERVIEW

 

Project Management   PROJECT MANAGEMENT

 

Documentation   DOCUMENTATION

PROPOSAL

Final   FINAL


Abstract

Sentiment analysis on social media provides organisations an opportunity to monitor their reputation and brands by extracting and analysing online comments posted by Internet users about them or the market. There are various existing methods to carry out sentiment analysis on social media data, with varying results. In this paper, we show how to use Twitter as a corpus for sentiment analysis and investigate the effectiveness of sentiment text analysis on tweets. We adopt a supervised approach, by labelling the training data and using it to train a model for predicting the responses on the unlabelled testing data.


Introduction

In the past decade, we have witnessed the rapid proliferation of social media worldwide. Since Twitter launched in 2006, the social networking microblogging service has grown rapidly to become the second largest social network after Facebook. Twitter now boasts 284 million monthly active users and they send out 500 million tweets per day as of December 2014.[1] It has become a real-time information network generated by people around the world that let users share their thoughts about various topics in short updates or tweets in 140 characters of text or less. According to a report by We Are Social[2], Twitter is growing the fastest in Asia Pacific and Singaporeans are one of the most active social media consumers in the world, with the world’s second highest social penetration rate in Singapore at 59%, more than double the global average of 26%. Singaporeans are also more connected to the Internet as compared to the rest of the world on average, with an Internet penetration rate is 73%, above the global average of 35%. There are an estimated 200,000 Twitter users in Singapore.[3] This represents a great source of data that we can analyse and derive valuable insights from.


However, harnessing big data is challenging as data lacks structure and context. Computers cannot deal with implicit information as well as humans do. This project aims at qualifying and quantifying the trends in human emotions expressed by Twitter users over a period of time via sentiment analysis, which is the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials. In other words, it determines whether a tweet is positive, negative or neutral. This proves vital during the 2014 Israel – Gaza Conflict, with many taking to Twitter for real-time news and updates on the crisis. Majority found Twitter to be a powerful means of expressing their activism against Israel’s brutal campaign in the region.[4] A deep sentiment analysis of social network data, such as Twitter, could lead to very interesting insights of global public opinion. In this conflict, it could help in engaging more people to help balancing the world’s public opinion, both during the fighting and after the cease fire.[5]


Change in Project Scope

Having consulted with our professor, we have decided to shift our focus away from developing a dashboard and delve deeper into the subject of text analysis of social media data, specifically Twitter data. Social media has changed the way how consumers provide feedback to the products they consume. Much social media data can be mined, analysed and turned into value propositions for change in ways companies brand themselves.

Although anyone and everyone can easily attain such data, there are certain challenges faced that can hamper the effectiveness of such analysis.

  1. Can conventional text analysis methods be done on social media data?
  2. How effective are these methods?
  3. What are some of the unique features of social media that we need to take note of when doing text analysis on them?
Through this project, we are going to explore what some of these challenges are and ways in which we can overcome them.


Related Work

With the proliferation of blogs and social networks, opinion mining and sentiment analysis became a field of interest for many researches. It is considered more challenging than conventional text such as review documents due to the nature of tweets: short length, frequent use of informal and irregular words, and the rapid evolution of language on Twitter.

A real-time text-based hedonometer was built to measure happiness of over 63 million Twitter users over 33 months, as recorded in the paper Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter (Dodds et al., 2011)[6]. It shows how a highly robust and tunable metric can be constructed with the word list chosen solely by frequency of usage.

In the paper Twitter as a Corpus for Sentiment Analysis and Opinion Mining (Pak & Paroubek, 2010)[7], the authors show how to use Twitter as a corpus for sentiment analysis and opinion mining, perform linguistic analysis of the collected corpus and build a sentiment classifier that is able to determine positive, negative and neutral sentiments for a document. The authors build a sentiment classifier using the multinomial Naïve Bayes classifier that uses N-gram and part-of-speech tags as features as it yielded the best results as compared to Support Vector Machines (SVMs) and Conditional Random Fields (CRFs) classifiers. We will be using the Naive Bayes classifier too.

For the paper Twitter Sentiment Analysis: The Good the Bad and the OMG! (Kouloumpis et al., 2011)[8], the authors investigate the usefulness of linguistic features for detecting the sentiment of tweets. The results show that show that part-of-speech features may not be useful for sentiment analysis in the microblogging domain while the microblogging features (i.e., the presence of intensifiers and positive / negative / neutral emoticons and abbreviations) were clearly the most useful.

The authors mentioned in the paper Exploiting Emoticons in Sentiment Analysis (Hogenboom et al., 2013)[9] created an emoticon sentiment lexicon in order to improve a state-of-the-art lexicon-based sentiment classification method. It demonstrated that people typically use emoticons in natural language text in order to express, stress, or disambiguate their sentiment in particular text segments, thus rendering them potentially better local proxies for people’s intended overall sentiment than textual cues. We will be analysing emoticons too to improve the accuracy of our model.

In the paper Tokenization and Filtering Process in RapidMiner (Verma et al., 2014)[10], the authors shows how text mining is implemented in Rapidminer through tokenisation, stopword elimination, stemming and filtering. We will be using RapidMiner too in our methodology.


Methodology

In this project, we are dealing with unstructured text from Twitter. This area of study is also known as Natural Language Processing (NLP). Major components of NLP include:
  • Information extraction
  • Information retrieval
  • Document Summarisation
  • Sentence Parsing
One of our key objectives is to classify the tweets into 3 different categories, namely Positive (P), Negative (N) and Neutral (X) based on the sentiments of the tweet.

Approach: Supervised Learning

We will be gathering a collection of tweets (documents) in each of the pre-defined categories and train a classifier using the training data.

Assessment of Appropriate Software for Text Analytics

In order to determine the feasibility of text analytics for tweets, our team needed to assess which software will be best in determining an accurate sentiment classification for tweets. Our criteria in using the software is that it is easily accessible and configured, has the capabilities of NLP and classification, available for both Mac OS and Windows, and is able to perform well given limited processing capabilities of our machine.

Our criteria in selecting the software are the following:

  • Easily accessible and configurable
  • Capabilities in text analytics
  • Available for Mac OS X and Windows
  • Able to perform well with limited processing capabilities
  • Numerous support from online resources

Based on our initial research, we have determined three potential tools to conduct our experiments, IBM SPSS Modeler, SAS Enterprise Miner and RapidMiner. These tools are readily available from SMU’s software repository.

We also considered the use of R but we believe that by using R we will need to learn the language and research all possible modules related to text mining. Given our limited time frame, we decided to forego R and focus on software that has a graphical user interface (or GUI) for ease of use and require less scripting.

Easily Accessible & Configurable

SPSS and RapidMiner allows users to connect to any ODBC or JDBC database while also taking in various number of text documents such as CSV and Excel. IBM SPSS, however, requires the use of IBM databases. We believe that SPSS and RapidMiner are both easily accessible and configurable to the extent of which we require the use of these tools while SPSS requires us to further make use of IBM products and revolve around their ecosystem.

Capabilities in Text Analytics

The following are the functions available from these softwares for text analytics:
Features SAS Enterprise Miner IBM SPSS Modeler RapidMiner
Classification
Entity Extraction
Natural Language Processing
Trend Analysis
Multi-Language Processing
Word Association
Comparison Analysis
Document Exploration
Based on the above table, SAS Enterprise Miner has the most available functionalities for text analytics. However, as we are focused on natural-language processing and classification for our experiments, we believe that all three meet the criteria in being able to conduct the analysis that we need.

Numerous Support from Online Resources

SMPO-Scholarly Articles.png
Based on the above chart (Muenchen, 2015)[11], RapidMiner has the most resources available for support in scholarly articles. On the other hand, SAS Enterprise Miner has 300 while SPSS Modeler has the least with less than 220 articles.

Result

Overall based on the criteria, we believe that RapidMiner will be the best software to conduct our experiments. We will be using RapidMiner to train a classifier and apply it on new test data to see the accuracy of the tagging process.


Download RapidMiner here

Screenshots Steps
Text processing module.JPG

Setting up RapidMiner for Text Analysis

To carry out text processing in RapidMiner, we need to download the plugin required from the RapidMiner's plugin repository.

Click on Help > Managed Extensions and search for the text processing module.

Once the plugin is installed, it should appear in the "Operators" window as seen below.
Tweets Jso.JPG

Data Preparation

In RapidMiner, there are a few ways in which we can read a file or data from a database. In our case, we will be reading from tweets provided by the LARC team. The format of the tweets given was in the JSON format. In RapidMiner, JSON strings can be read but it is unable to read nested arrays within the string. Thus, due to this restriction, we need to extract the text from the JSON string before we can use RapidMiner to do the text analysis. We did it by converting each JSON string into a Javascript object and extracting only the Id and text of each tweet and write them onto a comma separated file (.csv) to be process later in RapidMiner.

Defining a Standard

Before we can create a model for classifying tweets based on their polarity, we have to first define a standard for the classifier to learn from. In order to attain such a standard, we manually tag a random sample of 1000 tweets with 3 categories; Positive (P), Negative (N) and Neutral (X). One of the challenges faced is understanding irony as even humans sometimes face difficulty understanding someone who is being sarcastic. It is proven in a University of Pittsburgh study that humans can only agree on whether or not a sentence has the correct sentiment 80% of the time.[12] With the tweets and their respective classification, we were ready to create a model for machine learning of tweets' sentiments.

Creating a Model

Screenshots Steps
ReadCsv.JPG
Read CSV
We first used the "read CSV" operator to read the text from the prepared CSV file that was done earlier. This can be done via an "Import Configuration Wizard" or set manually.
ReadCsv configuration.JPG

Each column is separated by a ",".
Trim the lines to remove any white space before and after the tweet.
Check the "first row as names" if there a header is specified.

Normtotext.JPG
Nominal to Text
To check the results at any point of the process, right click on any operators and add a breakpoint. To process the document, we convert the data from nominal to text.
DataToDoc.JPG
Data to Documents

We convert the text data into documents. In our case, each tweet is converted in a document.

ProcessDocument.JPG
Process Documents
The "process document" operator is a multi-step process to break down each document into single words. The number of frequency of each word as well as their occurrences (in documents) are calculated and used when formulating the model. To begin the process, double-click on the operator.
Tokenize.JPG

1. Tokenizing the tweet by word

Tokenization is the process of breaking a stream of text up into words or other meaningful elements called tokens to explore words in a sentence. Punctuation marks as well as other characters like brackets, hyphens, etc. are removed.

2. Converting words to lowercase

All words are transformed to lowercase as the same word would be counted differently if it was in uppercase vs. lowercase.

3. Eliminating stopwords

The most common words such as prepositions, articles and pronouns are eliminated as it helps to improve system performance and reduces text data.

4. Filtering tokens that are smaller than 3 letters in length

Filters tokens based on their length (i.e. the number of characters they contain). We set a minimum number of characters to be 3.

5. Stemming using Porter2’s stemmer

Stemming is a technique for the reduction of words into their stems, base or root. When words are stemmed, we are keeping the core of the characters which convey effectively the same meaning.

Porter Stemmer vs Snowball (Porter2)[13]

Porter: Most commonly used stemmer without a doubt, also one of the gentlest stemmers. It is one of the most computationally intensive of the algorithms (granted not by a very significant margin). It is also the oldest stemming algorithm by a large margin.

Snowball (Porter2): Nearly universally regarded as an improvement over porter, and for good reason. Porter himself in fact admits that Snowball is better than his original algorithm. Has a slightly faster computation time than snowball, with a fairly large community around it.

We use the Porter2 stemmer.

SMPO-Generate TFIDF.PNG
Term Weighting

We used TF-IDF (term frequency*inverse document frequency) to set the importance of each word to a particular label. The TF-IDF takes into account 2 things: if a term appears on a lot of documents, each time it appears in a document, it is probably not so important.

Conversely, if a term is seldom used in most of the documents, when it appears, the term is likely to be important.
Setrole.JPG
Set Role
Return to the main process.

We need to add the "Set Role" process to indicate the label for each tweet. We have a column called "Classification" to assign the label for that.

Validation.JPG
Cross-Validation
The "X-validation" operator creates a model based on our manual classification which can later be used on another set of data. To begin, double click on the operator.
ValidationX.JPG
Naive Bayes Classifier
We carry out an X-validation using the Naive Bayes model classification, a simple probabilistic classifier based on applying Bayes' theorem (from Bayesian statistics) with strong (naive) independence assumptions. In simple terms, a Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class (i.e. attribute) is unrelated to the presence (or absence) of any other feature.
5000Data.JPG
Apply Model to New Data
To apply this model to a new set of data, we repeat the above steps of reading a CSV file, converting it the input to text, set the role and processing each document before applying the model to the new set of tweets.
Prediction.JPG
Results
From the performance output, we achieved 44.6% accuracy when the model was cross validated with the original 1000 tweets that were manually tagged. To affirm this accuracy, we randomly extracted 100 tweets from the fresh set of 5000 tweets and manually tag these tweets and cross validated with the predicted values by the model. The predicted model did in fact have an accuracy of 46%, a close percentage to the 44.2% accuracy using the X-validation module.


Improving Accuracy

Pruning

One of the ways to improve the accuracy of the model is to remove words that do not appear frequently within the given set of documents. By removing these words, we can ensure that the resulting words that are classified are mentioned a significant number of times. However, the challenge is to determine what the number of occurrences required is before a word can be taken into account for classification. It is important to note that the higher the threshold, the smaller the result and word list would be. Practical problems exist when modelling text statistically, since we require a reasonably sized corpus in order to overcome sparseness problems, but at the same time we face the challenge of irrelevant words exerting their weights on an independent set of test data when applying the model.

We experimented with multiple values to determine the most appropriate amount of words to be pruned off, bearing in mind that we need a sizeable number of words with a high enough accuracy yield.

  • Percentage pruned refers to the words that are removed from the word list that do not occur within the said amount of documents. e.g. for 1% pruned out of the set of 1000 documents, words that appeared in less than 10 documents are removed from the word list.


PercentagePruned.JPG
Percentage Pruned Percentage Accuracy Deviation Size of resulting word list
0% 39.8% 5.24% 3833
0.5% 44.2% 4.87% 153
1% 42.2% 2.68% 47
2% 45.1% 1.66% 15
5% 43.3% 2.98% 1
From the results, we could infer that a large number of words (3680) appears only in less than 5 documents as we see the resulting size of the word list falls from 3833 to 153 when we set the percentage pruned at 0.5%

Results

(Click on the image to enlarge)


Types of Classifiers

Support Vector Machine

More formally, a support vector machine constructs a hyperplane or set of hyperplanes in a high- or infinite- dimensional space, which can be used for classification, regression, or other tasks. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training data points of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier. Whereas the original problem may be stated in a finite dimensional space, it often happens that the sets to discriminate are not linearly separable in that space. For this reason, it was proposed that the original finite-dimensional space would be mapped into a much higher-dimensional space, presumably making the separation easier in that space.[14]

K-Nearest Neighbour

The k-Nearest Neighbor algorithm is based on learning by analogy, that is, by comparing a given test example with training examples that are similar to it. The training examples are described by the words that are contained within the document. Each example represents a point in an n-dimensional space, depending on the size of the word list. In this way, all of the training examples are stored in an n-dimensional pattern space. When given a new document with its features, a k-nearest neighbour algorithm searches the pattern space for the k training examples that are closest to the unknown example. These k training examples are the k "nearest neighbours" of the unknown example. "Closeness" is defined in terms of a distance metric, such as the Euclidean distance.[15]

Naives Bayes

A Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem (from Bayesian statistics) with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be 'independent feature model'. In simple terms, a Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class (i.e. attribute) is unrelated to the presence (or absence) of any other feature. For example, a tweet or document is based upon the words that are contained within it. Words do not affect one another and are independent of each other. Even if these features depend on each other or upon the existence of the other features, a Naive Bayes classifier considers all of these properties to independently contribute to the probability that this tweet is unique to itself.

The advantage of the Naive Bayes classifier is that it only requires a small amount of training data to estimate the means and variances of the variables necessary for classification. Because independent variables are assumed, only the variances of the variables for each label need to be determined and not the entire covariance matrix.

A kernel is a weighting function used in non-parametric estimation techniques. Kernels are used in kernel density estimation to estimate random variables' density functions, or in kernel regression to estimate the conditional expectation of a random variable.[16]


Comparing the Performance of the 3 Classifiers

The different classifiers were experimented to see which performed the best in terms of creating a model for classification. Each respective model was applied to a testing set of data of size 100, with each tweet manually tag with a category. The results can be seen below.


SMPO-Performance of Classifiers.png


The classifiers work effectively when 0% of the processed word list is used. Support Vector Machine performs well when there is good separation of points on the data plane (since functional margin of the new tokens are closer to the points gotten from the training data). However, the words that has too low an occurrence should not be taken into account as the weight they hold to determine a category is too minute. 2 other boundaries were tested, mainly 0.5% of the word list pruned as well as at 1%. At 2% pruned, the word list falls to a size of 15. The different model classifiers performs around similar levels of accuracy, and the Naïve Bayes (kernel) method which had a level of 50% accuracy was chosen as our choice of a model classifier.


Deriving Insights from Emoticons

An emotion icon, better known by the emoticon is a metacommunicative pictorial representation of a facial expression that, in the absence of body language and prosody, serves to draw a receiver's attention to the tenor or temper of a sender's nominal verbal communication, changing and improving its interpretation. It expresses — usually by means of punctuation marks (though it can include numbers and letters) — a person's feelings or mood, though as emoticons have become more popular, some devices have provided stylized pictures that do not use punctuation.

Experiment

The data that we have was in plain text. To be able to view the emoticons, we needed a "translator" to convert the emoticons used. This can be done on any browser with a plugin to convert these emoticons. We carried out the following steps:
  1. Print the entire list of tweets that we have.
  2. Identify the ones that has a converted emoticon tag (e.g. "😔").
  3. Get the list of emoticons from an emoticon library[17] and tag each emoticon with positive (P), Negative (N) and Neutral (X).
  4. For each tweet that we have, we manually tag the tweets based on the sentiments of the tweets.
  5. Cross validate that with the sentiments of the emoticons present in the tweet.
  6. Calculate the percentage of matches between the 2 tagged values.
We carried out this experiment on 100 random tweets with emoticons and matched the accuracy of the sentiments. We achieved a 82% match/accuracy in terms of using the emoticons to determine the sentiments of the tweets.


Deriving Word Associations

Modelling words co-occurrence is important for many natural language applications, such as topic segmentation (Ferret, 2002), query expansion (Vechtomova et al., 2003), machine translation (Tanaka, 2002), language modeling (Dagan et al., 1999; Yuret, 1998), and term weighting (Hisamitsu andNiwa, 2002). We want to know if certain words given within a set of tweets happen more often than expected by chance. We highlight the process of getting word associations below.
Screenshots Steps
WordAsoc1.JPG
Import Data & Process Document
  1. We will first import the data as explained earlier where we did the classification of the words in the text analysis process
  2. For the "Process Document" operator, we will have to set the "vector creation" option to binary term occurences for the "FP-Growth" Operator later
WordAsoc2.JPG
Frequency Pattern Growth
  1. We will need to convert the word matrix from "process documents" to binomial form for the "FP-Growth" Operator. This will convert all the "0"s and "1"s to "false" and "true" respectively.
SingleWordAsoc.JPG
Results
  1. Out of the 20000 tweets, we were only able to draw 121 sets of word associations, of which 25 contains 2 words, 12 contain 3 words and 2 contain 1 word
  2. The word with the highest support stands at 0.044, a far cry from the minimal support of 0.75, commonly used for associating words.
We conclude that deriving word associations would be a huge challenge when it comes to Twitter data and may be deemed irrelevant when doing text analytics on them. With tweets holding at most 140 characters, it is no surprise that we are unable to derive high volumes of word associations from the data set. It is even harder to derive word associations when the data set is time-based rather than event-based. The topics discussed vary greatly, making it hard to formulate word associations. Furthermore, it is apparent that the tone and vocabulary used by Twitter users are casual, and with the likes of short forms and abbreviations, it is even hard to draw word associations from tweets.


“Purified” English-only Training Data

One of the key concerns for Natural Language Processing would be to localise the training data to create a more accurate model for sentiment classification.

To test if a training set of data that contains only English words, works more effectively against another with multiple languages (Malay, Chinese, and other mainly Asian languages), we compare the accuracy of the resulting classification and see which produce a better result. We screened 1000 tweets that contain only English words versus the original training data that was randomly picked out. The Naïve Bayes classifier was used as per the earlier test cases.

The performance of the training data that contains only English words was 13% more accurate, with an accuracy of 63% for the set of training data. See here for the set of training data.


Pitfalls of Using Conventional Text Analysis on Social Media Data

Multiple Languages

Being a multilingual and multiracial community, this makes it more challenging to do text analysis in Singapore Twittersphere, as we have to take into account different languages. For each specific language, a dictionary is required to translate the text to the English language before any natural language processing can be done on the text. With advanced tools like RapidMiner not being able to accommodate Chinese, Malay or even Korean words, much work have to be done to come up with a localisation tool to analyse the social media data here.

Misspelled Words and Abbreviations

With the limitation of 140 chars in twitter, twitter users are fond of using abbreviations and short forms to substitute words that they want to convey. A huge challenge is to unravel misspelled words , and differentiating the former with these words as well. This can be done using a more robust or aggressive stemmer that deciphers abbreviations, remove unnecessary repeated characters in a word and correcting short forms to their root words.

Length of Status

The length of status is 140 characters long, which makes it difficult to have any word associations with strong support and confidence levels. Given such a short length, there may be insufficient space to substantiate a point or may lack evidence to the true sentiments of the tweet.

Other Media Types

Other media types (URL, image URL and video URLs) are common attachments that Twitter users used to convey a message. In certain cases, this media type makes up the entire tweet, which nullifies any textual analysis done on the tweet itself. Much more context may be derived if information of the link is embedded into the tweet itself. Unfortunately, such a feature is still not available and hence, hinders the process of analysis on Tweets.


Future Work / Improving the Effectiveness of Sentiment Analysis of Social Media Data

Increasing Size of Training Data

The larger the size of data, the more accurate the model would be. However, the time to process and apply the model may also increase.

Leveraging on Emoticons

Emoticons provide more insights to how the user is feeling with just a single character. In tweets, where the number of characters is a valuable resource, emoticons come into play quite frequently. Being able to dissect a tweet based on the emoticons in it and assigning a sentiment score to the emoticons use, we can get a more accurate depiction of the tweet's overall sentiment score as compared to analysing the text itself.

Allowing the User to Tag Their Feelings to Their Status

One of the ways in which Facebook may make such analysis easier is by allowing the user to specify how he/she is feeling at the moment of posting a status. With this option, Facebook has effectively increase the probability of determining the right sentiment of the user at the point in time. This mitigates the possibility of sarcasm or other inferred sentiments within that post itself.


Fb sentiment tagging.png


Analyse Data on an Event/Topic Basis Rather Than on Time

The data that we used was within a given time frame of 1 month. Drilling down this tweets to a particular topic (hashtag) or an event would bring about more significant results. Brands which want to conduct sentiment analysis on social media data should make it specific to a particular campaign/event/initiative.


Acknowledgements

The team would like to thank Professors Kam Tin Seong and Seema Chokshi for providing valuable feedback for our project. The team would also like to thank Aek Palakorn Achananuparp and Arinto Murdopo from SMU Living Analytics Research Centre for providing us Singapore-based Twitter data to work with.


References

  1. About Twitter. (2014, December). Retrieved from https://about.twitter.com/company
  2. Kemp, S. (2015, January 21). Digital, Social & Mobile in 2015. Retrieved from http://wearesocial.sg/blog/2015/01/digital-social-mobile-2015/
  3. Yap, J. (2014, June 4). How many Twitter users are there in Singapore? Retrieved April 22, 2015, from https://vulcanpost.com/10812/many-twitter-users-singapore/
  4. Gaza takes Twitter by storm. (2014, August 20). Retrieved April 22, 2015, from http://www.vocfm.co.za/gaza-takes-twitter-by-storm/
  5. MasterMineDS. (2014, August 6). 2014 Israel – Gaza Conflict: Twitter Sentiment Analysis. Retrieved April 22, 2015, from http://www.wesaidgotravel.com/2014-israel-gaza-conflict-twitter-sentiment-analysis-mastermineds
  6. Dodds PS, Harris KD, Kloumann IM, Bliss CA, Danforth CM (2011) Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter. PLoS ONE 6(12): e26752. doi:10.1371/journal.pone.0026752
  7. Pak, A., & Paroubek, P. (2010). Twitter as a Corpus for Sentiment Analysis and Opinion Mining.
  8. Kouloumpis, E., Wilson, T., & Moore, J. (2011). . In International AAAI Conference on Weblogs and Social Media. Retrieved from https://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/view/2857/3251
  9. Hogenboom, A. and Bal, D. and Frasincar, F. and Bal, M. and de Jong, F.M.G. and Kaymak, U. (2013) Exploiting emoticons in sentiment analysis. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC 2013, 18-22 Mar 2013, Lisbon, Portugal. pp. 703-710. ACM. ISBN 978-1-4503-1656-9
  10. Verma, T., & Renu, D. G. (2014). Tokenization and Filtering Process in RapidMiner. International Journal of Applied Information Systems (IJAIS)–ISSN, 2249-0868.
  11. Muenchen, B. (2015, March 26). Google Scholar Finds Far More SPSS Articles; Analytics Forecast Updated. Retrieved March 7, 2015, from http://r4stats.com/2015/03/26/google-scholar-spss/
  12. Wiebe, J., Wilson, T., & Cardie, C. (2005). Annotating Expressions of Opinions and Emotions in Language. Language Resources and Evaluation, 165-210. Retrieved from http://people.cs.pitt.edu/~wiebe/pubs/papers/lre05.pdf
  13. Tyranus, S. (2012, June 26). What are the major differences and benefits of Porter and Lancaster Stemming algorithms? Retrieved from http://stackoverflow.com/questions/10554052/what-are-the-major-differences-and-benefits-of-porter-and-lancaster-stemming-alg
  14. Support Vector Machine (RapidMiner Studio Core). (n.d.). Retrieved April 22, 2015, from http://docs.rapidminer.com/studio/operators/modeling/classification_and_regression/svm/support_vector_machine.html
  15. K-NN (RapidMiner Studio Core). (n.d.). Retrieved April 22, 2015, from http://docs.rapidminer.com/studio/operators/modeling/classification_and_regression/lazy_modeling/k_nn.html
  16. Naive Bayes (RapidMiner Studio Core). (n.d.). Retrieved April 22, 2015, from http://docs.rapidminer.com/studio/operators/modeling/classification_and_regression/bayesian_modeling/naive_bayes.html
  17. Emoticon - emotions library https://github.com/wooorm/emoji-emotion/blob/master/data/emoji-emotion.json