Social Media & Public Opinion - Project Findings

Limitations of Hedonometer

What it cannot detect:

Our Approach

Machine Learning

Given the limitations of the Happiness index score by Hedonometer, we are attempting to use these sample tweets to learn and generate a more robust set of lexicon/dictionary. Machine learning is a scientific discipline that explores the construction and study of algorithms that can learn from data. Such algorithms operate by building a model from example inputs and using that to make predictions or decisions rather than following a strictly static program.

This dictionary will be built on top of the research done by Hedonometer as use their dictionary as a starting point. To calculate the score of a particular tweet, words that appears in a given tweet and in the Hedonometer dictionary are used to calculate the overall happiness score of the entire tweet. To determine whether a tweet is positive , the overall score of the tweet has to be more than 5 (center score in the happiness index) multiplied by the number of words that coincide in the dictionary, and less than that amount to be considered negative. Based on a given set of sample tweets, we track the number of times a particular word appears in a "positive" tweet and the number of times it appears in a "negative" tweet. The percentage in which it appears positive will be how positive it is against other words. On top of that, words that were previously not documented will also be included and their score counted as well.

Testing our new dictionary

To determine the accuracy of the dictionary, human test subjects will be employed to judge whether or not the dictionary is in fact effective in determining the polarity of the tweet. Each human subject will be given 2 tweets to judge, with each of these tweets having a pre-defined score after running through the new dictionary. If the human subject's perception of the 2 tweets coincides with that of the dictionary, the test will be given a positive, else a negative is awarded. A random sample of 100 users will be chosen to do at least 10 comparisons each. At the end of these tests, we will calculate the number of positives over the total tests done. The proportion will determine the accuracy of our dictionary.

Social Media & Public Opinion - Project Findings

Contents

Limitations of Hedonometer

Negation handling

Abbreviations, smileys/emoticons and special symbols

Local languages & slangs (Singlish)

Ambiguity

Sarcasm

Our Approach

Machine Learning

Testing our new dictionary

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools