Difference between revisions of "Twitter Analytics: Findings"

From Analytics Practicum
Jump to navigation Jump to search
Line 103: Line 103:
 
</div>
 
</div>
 
In order to come up with this analysis, all the words are sorted in decreasing order and the top 20 words are chosen. Some of the most frequent words are mentioning IPhone and Iphone6 news. However as this is the raw data, words are not cleaned yet. To investigate words further, topic creation analysis will be explored in the next section
 
In order to come up with this analysis, all the words are sorted in decreasing order and the top 20 words are chosen. Some of the most frequent words are mentioning IPhone and Iphone6 news. However as this is the raw data, words are not cleaned yet. To investigate words further, topic creation analysis will be explored in the next section
 +
 +
 +
==<div style="background: #000033; padding: 13px; font-weight: bold; text-align:center; line-height: 0.3em; text-indent: 20px;font-size:26px; font-family:Britannic Bold"><font color= #ffffff>Topic Creation</font></div>==
 +
<div style="margin:20px; padding: 10px; background: #ffffff; font-family: Trebuchet MS, sans-serif; font-size: 95%;-webkit-border-radius: 15px;-webkit-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96); -moz-box-shadow:    7px 4px 14px rgba(176, 155, 121, 0.96);box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);">
 +
<h2> Frequent Terms</h2>
 +
<font size =3 face=Georgia >
 +
As the data has already been properly cleaned, frequent terms can be analyzed further.
 +
Firstly, the lowfreq is set into 50 which indicate the words that appear at least 50 times. The result is as below:
 +
<div align="center">
 +
[[Image:fap100.png|500px]]
 +
</div>
 +
If we increase the rule to freq is at least 500, we are able to concentrate on a few words that may lead to topic creation of different topic such as “app”, “win”, and camera”
 +
<div align="center">
 +
[[Image:fap115.png|500px]]
 +
</div>
 +
 +
<h2> Topic Groupings</h2>
 +
Topic creation is done using the algorithm to detect each word and grouped in based on the cluster that it belongs to. Various number of cluster (k) are tried to get the optimum number.
 +
Thereafter, the top 10 words in each topic are listed for analysis.
 +
 +
<h3> 5 Topics </h3>
 +
<div align="center">
 +
[[Image:fap116.png|500px]]
 +
</div>
 +
Based on 5 topics, there seem to be distinct topics emerging:
 +
<div align="center">
 +
[[Image:fap117.png|500px]]
 +
</div>
 +
 +
<h3> 4 Topics </h3>
 +
<div align="center">
 +
[[Image:fap118a.png|450px]]
 +
[[Image:fap119.png|500px]]
 +
</div>
 +
Most of the topic remains except the topic with Apple review. Since it is a fairly important independent topic, 5 topics are deemed to be more suitable and give more insight to the users. We will use 5 topics in for further analysis.
 +
 +
<h3> 6 Topics </h3>
 +
<div align="center">
 +
[[Image:fap120.png|500px]]
 +
[[Image:fap121.png|500px]]
 +
</div>

Revision as of 15:59, 12 October 2014


Home   Project Overview   Project Management   Documentation   Findings   About Me

Descriptive analysis

Characters per tweet

Fap101.png

The mean and median of the character of the Tweet is around 120 characters while the maximum is 208 characters.

Based on research done by Buddy Media and Track Social, engagement rate is optimum at 100 characters in a tweet. Hence, companies should look into not only maximizing their tweet characters post but also the optimum of it.

http://blog.bufferapp.com/the-ideal-length-of-everything-online-according-to-science

Words per Tweet

Fap102.png

From the graph above, most users tend to tweet 11 words or 17 words. As this is the “usual” number of words of tweet, companies may want to look into following the same pattern.

Length of Words per tweet

Fap103.png

In each word, users tend to tweet words with five, seven and ten characters for their tweet.

Unique Words per tweet

Fap104.png

Users tend to tweet 11 or 17 unique words per tweet. If we compare with the distribution of words in a tweet, users tend to tweet 11 or 17 words. Hence, majority of them did not repeat any words in their tweet which resulted in high similarity. However, users also tend to have 22 unique words as compared to their 24 posted words. This may happened as the longer the tweet, the harder it is not to repeat any words, resulting in shorter unique words.

Distribution of Hashtag per tweet

Fap105.png Fap106.png

It is interesting to note of the hashtag popularity in twitter. From the figure above, we can observe that majority of the users have used at least one hashtag and as long as 8 hashtags in their post. Hence, it is important for companies to realize the importance of hashtag in ensuring that their posts can be grouped to increase the eyeball reach.

Mentions per tweet

Fap107.png Fap108.png

Although the users tend to use hashtag, retweet may not be popular in this particular Iphone tweets. Apple may want to promote a new marketing line in Twitter that will encourage people to retweet their posts which will increase their popularity.

Links per tweet

Fap109.png Fap110.png

Users tend to have 1 link in their tweet. This may be due to the character limitations of Twitter and hence users tend to redirect their readers to another website to share more about their posts.

No of words vs Characters

Fap111.png

Data frame is used to create the relationship and plot in the graph using ggplot(). Based on the graph, we can see that there is a relationship between the number of words and the number of characters. However, the correlation value is 0.73 which may not be significant enough. This may due to people are using more words in their tweet to convey their message rather than lengthening their individual words. In order to verify the claim, we will investigate the number of words in tweet vs the length of the tweets

No of words vs Length of Words

Fap112.png

Based on the graph, there is a slight correlation (-0.624) between the number of words and the length. Hence, the longer the tweet, the shorter the words in it will be.

Lexical Diversity

Lexical diversity reflects the range of diversity in vocabulary used by the user in twitter. The insight can tell us how much word variety is there in the iphone tweet pattern. Lexical diversity is measured by the number of unique tokens/ number of total tokens.

Fap113.png

Based on the result, there seems to be low lexical diversity in the iPhone tweets which may suggest that the topic of interest is highly common

20 Most Used Words

Fap114.png

In order to come up with this analysis, all the words are sorted in decreasing order and the top 20 words are chosen. Some of the most frequent words are mentioning IPhone and Iphone6 news. However as this is the raw data, words are not cleaned yet. To investigate words further, topic creation analysis will be explored in the next section


Topic Creation

Frequent Terms

As the data has already been properly cleaned, frequent terms can be analyzed further. Firstly, the lowfreq is set into 50 which indicate the words that appear at least 50 times. The result is as below:

Fap100.png

If we increase the rule to freq is at least 500, we are able to concentrate on a few words that may lead to topic creation of different topic such as “app”, “win”, and camera”

Fap115.png

Topic Groupings

Topic creation is done using the algorithm to detect each word and grouped in based on the cluster that it belongs to. Various number of cluster (k) are tried to get the optimum number. Thereafter, the top 10 words in each topic are listed for analysis.

5 Topics

Fap116.png

Based on 5 topics, there seem to be distinct topics emerging:

4 Topics

Fap118a.png Fap119.png

Most of the topic remains except the topic with Apple review. Since it is a fairly important independent topic, 5 topics are deemed to be more suitable and give more insight to the users. We will use 5 topics in for further analysis.

6 Topics

Fap120.png Fap121.png