Difference between revisions of "1718t1is428T1"
(6 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | [[File:Awesomeness-Logo.jpg|center|250px]] | ||
+ | <br> | ||
<!--MAIN HEADER --> | <!--MAIN HEADER --> | ||
{|style="background-color:#f48024;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" | | {|style="background-color:#f48024;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" | | ||
Line 41: | Line 43: | ||
# What are the little known technologies that are used for data analytics? | # What are the little known technologies that are used for data analytics? | ||
# What do influential members in Stackoverflow know? Are we able to create a competency list of these members? | # What do influential members in Stackoverflow know? Are we able to create a competency list of these members? | ||
+ | |||
+ | <div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">SELECTED DATASET</font></div> | ||
+ | |||
+ | {| class="wikitable" style="background-color:#FFFFFF;" width="100%" | ||
+ | ! Dataset/Source | ||
+ | ! Data Attributes | ||
+ | |- | ||
+ | | style="text-align: center;" | Badges | ||
+ | https://archive.org/download/stackexchange/stackoverflow.com-Badges.7z | ||
+ | | | ||
+ | * UserId, e.g.: "420" | ||
+ | * Name, e.g.: "Teacher" | ||
+ | * Date, e.g.: "2008-09-15T08:55:03.923" | ||
+ | |- | ||
+ | | style="text-align: center;" | Comments | ||
+ | https://archive.org/download/stackexchange/stackoverflow.com-Comments.7z | ||
+ | | | ||
+ | * Id | ||
+ | * PostId | ||
+ | * Score | ||
+ | * Text, e.g.: "@Stu Thompson: Seems possible to me - why not try it?" | ||
+ | * CreationDate, e.g.:"2008-09-06T08:07:10.730",- UserId | ||
+ | |- | ||
+ | | style="text-align: center;" | Posts | ||
+ | https://archive.org/download/stackexchange/stackoverflow.com-Posts.7z | ||
+ | | | ||
+ | * Id | ||
+ | * PostTypeId | ||
+ | * 1: Question | ||
+ | * 2: Answer,- ParentID (only present if PostTypeId is 2) | ||
+ | * AcceptedAnswerId (only present if PostTypeId is 1) | ||
+ | * CreationDate | ||
+ | * Score | ||
+ | * ViewCount | ||
+ | * Body | ||
+ | * OwnerUserId | ||
+ | * LastEditorUserId | ||
+ | * LastEditorDisplayName="Jeff Atwood" | ||
+ | * LastEditDate="2009-03-05T22:28:34.823" | ||
+ | * LastActivityDate="2009-03-11T12:51:01.480" | ||
+ | * CommunityOwnedDate="2009-03-11T12:51:01.480" | ||
+ | * ClosedDate="2009-03-11T12:51:01.480" | ||
+ | * Title | ||
+ | * Tags | ||
+ | * AnswerCount | ||
+ | * CommentCount | ||
+ | * FavoriteCount | ||
+ | |- | ||
+ | | style="text-align: center;" | Postlinks | ||
+ | https://archive.org/download/stackexchange/stackoverflow.com-PostLinks.7z | ||
+ | | | ||
+ | * Id | ||
+ | * CreationDate | ||
+ | * PostId | ||
+ | * RelatedPostId | ||
+ | * PostLinkTypeId | ||
+ | * 1: Linked | ||
+ | * 3: Duplicate | ||
+ | |- | ||
+ | | style="text-align: center;" | Users | ||
+ | https://archive.org/download/stackexchange/stackoverflow.com-Users.7z | ||
+ | | | ||
+ | * Id | ||
+ | * Reputation | ||
+ | * CreationDate | ||
+ | * DisplayName | ||
+ | * EmailHash | ||
+ | * LastAccessDate | ||
+ | * WebsiteUrl | ||
+ | * Location | ||
+ | * Age | ||
+ | * AboutMe | ||
+ | * Views | ||
+ | * UpVotes | ||
+ | * DownVotes | ||
+ | |- | ||
+ | | style="text-align: center;" | Votes | ||
+ | https://archive.org/download/stackexchange/stackoverflow.com-Votes.7z | ||
+ | | | ||
+ | * Id | ||
+ | * PostId | ||
+ | * VoteTypeId | ||
+ | * ` 1`: AcceptedByOriginator | ||
+ | * ` 2`: UpMod | ||
+ | * ` 3`: DownMod | ||
+ | * ` 4`: Offensive | ||
+ | * ` 5`: Favorite - if VoteTypeId = 5 UserId will be populated | ||
+ | * ` 6`: Close | ||
+ | * ` 7`: Reopen | ||
+ | * ` 8`: BountyStart | ||
+ | * ` 9`: BountyClose | ||
+ | * `10`: Deletion | ||
+ | * `11`: Undeletion | ||
+ | * `12`: Spam | ||
+ | * `13`: InformModerator | ||
+ | * CreationDate | ||
+ | * UserId (only for VoteTypeId 5) | ||
+ | * BountyAmount (only for VoteTypeId 9) | ||
+ | |} | ||
<div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">KEY TECHNICAL CHALLENGES</font></div> | <div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">KEY TECHNICAL CHALLENGES</font></div> | ||
Line 50: | Line 151: | ||
[[File:Timeline.jpg|center|1200px]] | [[File:Timeline.jpg|center|1200px]] | ||
+ | <div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">SKETCHES</font></div> | ||
+ | |||
+ | Below are some of the proposed visualisations that we are exploring. | ||
+ | |||
+ | <h4>Calendar Chart comparing for category on category analysis</h4> | ||
+ | [[File:VAT11718 1.png|frameless|left|500px]] Within StackOverflow, topics that users asked could be grouped into the following (but not limited to) categories: Integrated Development Environments (IDE), Programming Languages, Operating Systems (OS) and Databases. With that in mind, a calendar map could be used to visualise the relationships between any two different subcategories. StackOverflow may utilise this information to recommend relevant topics to its users. | ||
+ | <br> | ||
+ | <br> | ||
+ | <br> | ||
+ | <br> | ||
+ | <br> | ||
+ | <br> | ||
+ | <br> | ||
+ | <br> | ||
<div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">REFERENCES</font></div> | <div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">REFERENCES</font></div> | ||
http://sotagtrends.com | http://sotagtrends.com | ||
Line 56: | Line 171: | ||
https://insights.stackoverflow.com/survey/2017 | https://insights.stackoverflow.com/survey/2017 | ||
This report created by SO describe insights about SO users using data collected through their annual developer survey. Visualizations used here are simple but effective charts – bar charts, scatter plots and line charts – that describe intriguing phenomena such as preferred programming languages among ethnicities, representation of women in technology roles and correlated technologies used by developers. | This report created by SO describe insights about SO users using data collected through their annual developer survey. Visualizations used here are simple but effective charts – bar charts, scatter plots and line charts – that describe intriguing phenomena such as preferred programming languages among ethnicities, representation of women in technology roles and correlated technologies used by developers. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− |
Latest revision as of 15:50, 6 November 2017
StackOverflow is the largest, most trusted online community for developers to learn, share their programming knowledge, and build their careers. This is achieved through questions and answers posted on the site that concerns a wide range of topics related to computer programming. The success of such a platform depends on high quality questions and answers contributed by an engaged community. Thus, some measures to screen for quality includes using ‘badges’ to signify reputation for users, votes on post (upvote or downvote) and using community administrators to close questions that have been answered before or are irrelevant.
To sustain its popularity and success, the site needs to continually draw new knowledge from its users in the fast-paced software field. In addition, these said users must constantly be engaged to contribute to the site. As such, StackOverflow can better understand the relationships of its users and the topics that they engage in through data visualisation tools such as network graphs. By understanding social interactions better, it can guide StackOverflow to make necessary changes to keep its user base continually engaged. For instance, if StackOverflow notices a significant group of users with high upvotes for posts but do not post frequently, it can think of how can they encourage such users to use the platform regularly.
Objectives Some of the general problems (questions) that we wish to address are namely:
- What questions are more likely to be answered and which questions are answered promptly?
- What are the popular subcommunities and who are the biggest contributors?
- What is the distribution of experience levels within each community
- Measured by badges
- Measured by posts
- Measured by upvoted posts (or comments)
- Measured by years of experience on StackOverflow
- Who are the influential members within each StackOverflow subcommunity?
And because we are aspiring data analytics practitioners, here are some specific problems (questions) that we wish to address.
- What are the little known technologies that are used for data analytics?
- What do influential members in Stackoverflow know? Are we able to create a competency list of these members?
Dataset/Source | Data Attributes |
---|---|
Badges
https://archive.org/download/stackexchange/stackoverflow.com-Badges.7z |
|
Comments
https://archive.org/download/stackexchange/stackoverflow.com-Comments.7z |
|
Posts
https://archive.org/download/stackexchange/stackoverflow.com-Posts.7z |
|
Postlinks
https://archive.org/download/stackexchange/stackoverflow.com-PostLinks.7z |
|
Users
https://archive.org/download/stackexchange/stackoverflow.com-Users.7z |
|
Votes
https://archive.org/download/stackexchange/stackoverflow.com-Votes.7z |
|
- Storing a extremely large network of StackOverflow posts, comments and users in a graph database and creating indexes for fast retrieval.
- Using StackOverflow API to retrieve information on-demand and transform the data into insight on the visualization.
- Learning how to use the relevant tools for graph analysis such as neo4j, vis.js, sigma.js and D3.js
Below are some of the proposed visualisations that we are exploring.
Calendar Chart comparing for category on category analysis
Within StackOverflow, topics that users asked could be grouped into the following (but not limited to) categories: Integrated Development Environments (IDE), Programming Languages, Operating Systems (OS) and Databases. With that in mind, a calendar map could be used to visualise the relationships between any two different subcategories. StackOverflow may utilise this information to recommend relevant topics to its users.
http://sotagtrends.com This web application plots the number of questions asked over time by tags on Stackoverflow. Examples of tags include “node.js”, “d3.js” and “python”. Multiple tags can be added by the user and the user can also toggle a relative comparison across the tags. Other interesting features this visualization has includes a moving reference line that slides along the X-axis, showing the exact number of questions asked for each tag.
https://insights.stackoverflow.com/survey/2017 This report created by SO describe insights about SO users using data collected through their annual developer survey. Visualizations used here are simple but effective charts – bar charts, scatter plots and line charts – that describe intriguing phenomena such as preferred programming languages among ethnicities, representation of women in technology roles and correlated technologies used by developers.