Difference between revisions of "IS428 2017 18T1 Group01 Proposal"
(13 intermediate revisions by the same user not shown) | |||
Line 20: | Line 20: | ||
| style="font-family:Century Gothic; font-size:85%; solid #f48024 ; border:0px solid #f48024 background:#f48024; background:#f48024; text-align:center;" width="16.6%" | | | style="font-family:Century Gothic; font-size:85%; solid #f48024 ; border:0px solid #f48024 background:#f48024; background:#f48024; text-align:center;" width="16.6%" | | ||
− | [ | + | [https://github.com/atwj/IS428-AWSOMENESS-OVERFLOW <font color="#FFFFFF">APPLICATION</font>] |
| | | | ||
| style="font-family:Century Gothic; font-size:85%; solid #f48024 ; border:0px solid #f48024 background:#f48024; background:#f48024; text-align:center;" width="16.6%" | | | style="font-family:Century Gothic; font-size:85%; solid #f48024 ; border:0px solid #f48024 background:#f48024; background:#f48024; text-align:center;" width="16.6%" | | ||
− | [[IS428 2017 18T1 Group01 Report | <font color="#FFFFFF"> | + | [[IS428 2017 18T1 Group01 Report | <font color="#FFFFFF">RESEARCH PAPER</font>]] |
| | | | ||
|} | |} | ||
+ | [[IS428_2017_18T1_Group01_Proposal-version2 | <font color="#000">Version 1</font>]] | [[IS428_2017_18T1_Group01_Proposal | <font color="#000">Version 2</font>]] | ||
+ | <br> | ||
[[File:Stackoverflow.png|200px]] | [[File:Stackoverflow.png|200px]] | ||
<div style="margin-top:-50px"> | <div style="margin-top:-50px"> | ||
Line 33: | Line 35: | ||
</div> | </div> | ||
<div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">MOTIVATION</font></div> | <div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">MOTIVATION</font></div> | ||
− | + | Our research is motivated by the lack of information on community interaction within Stackoverflow. As Information Sys-tems students who view the site frequently and rely heavily on it for assistance, we re-alised that we are surprisingly unaware of how interaction occurs on the site to keep the community active. | |
+ | <br /> | ||
+ | <br /> | ||
+ | Most visualizations available show the growth and popularity of technologies, but not the underlying users driving these trends. We find it important to study user-to-user interaction as the community plays a very important role in providing useful content to the website. | ||
+ | the site to keep the community active. | ||
+ | <br /> | ||
+ | <br /> | ||
+ | Apart from the users, the use of multiple tags of technologies (e.g. javascript, jquery, react-native) which are placed on a question posted allow us to study how closely related different technologies are to one another. This is useful in understanding ways to in-tegrate separate technologies with one an-other, and to gauge the popularity of using these technologies together. | ||
<br /> | <br /> | ||
<div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">OBJECTIVES</font></div> | <div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">OBJECTIVES</font></div> | ||
− | |||
Some of the general problems (questions) that we wish to address are namely: | Some of the general problems (questions) that we wish to address are namely: | ||
− | # | + | # How are the different technologies on Stack Overflow used together? |
− | # What are the | + | # What are the different clusters of technologies, frequently used together? |
− | # | + | # Who is likely to answer my question? |
− | # | + | # Determine which user is more influential in each community? |
− | + | # Do top contributors answer actively and consistently over time? | |
− | + | # How does each technology grow over time? | |
− | |||
− | |||
− | |||
− | |||
− | # | ||
− | # | ||
<div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">SELECTED DATASET</font></div> | <div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">SELECTED DATASET</font></div> | ||
Line 150: | Line 152: | ||
* BountyAmount (only for VoteTypeId 9) | * BountyAmount (only for VoteTypeId 9) | ||
|} | |} | ||
− | + | <div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">DATA MODEL</font></div> | |
+ | [[File:Stackoverflow Data Model (2).png|center|600px]] | ||
+ | <p> | ||
+ | The data model best describes the user and application interactions on the Stack Overflow site. Building the data model allows us to decide what elements should be a node or a relationship. | ||
+ | </p> | ||
+ | <div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">BACKGROUND SURVEY OF RELATED WORKS</font></div> | ||
+ | {| class="wikitable" style="background-color:#FFFFFF;" width="100%" | ||
+ | ! style="width:50%" | Related Works | ||
+ | ! What We Can Learn | ||
+ | |- | ||
+ | | style="text-align: center;" | <b> Stack Overflow Trends </b> | ||
+ | [[File:Javascript-large-1-1024x621.png |center|400px]] | ||
+ | Source: https://insights.stackoverflow.com/trends?utm_source=so-owned&utm_medium=blog&utm_campaign=trends&utm_content=blog-link | ||
+ | | | ||
+ | * See how technologies have trended over time based on use of their tags since 2008 | ||
+ | * This graph is plotted in a time series and in comparison with other technologies, this allows viewers to quickly grasp which technology is gaining traction and support on the site | ||
+ | * There is a lack of visibility into changes in user interactions and contributions over time, as technology trends change | ||
+ | |- | ||
+ | | style="text-align: center;" | <b>Stack Overflow Tag Network</b> | ||
+ | [[File:Image_2017-11-23_19-33-59.png |center|400px]] | ||
+ | Source: https://www.kaggle.com/juliasilge/network-graph/code | ||
+ | | | ||
+ | * The graph is cluttered and static, so it is hard to zoom in to understand each cluster better | ||
+ | * Overlaps between different clusters are also hard to identify | ||
+ | * The nodes color are not obviously shown | ||
+ | * The graph can't compare between the clusters | ||
+ | |- | ||
+ | |} | ||
+ | <div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">SKETCHES STORYBOARD</font></div> | ||
+ | {| class="wikitable" style="background-color:#FFFFFF;" width="100%" | ||
+ | ! style="width:50%" | Sketches | ||
+ | ! How Analyst Can Conduct Analysis | ||
+ | |- | ||
+ | | style="text-align: center;" | <b> Stack Overflow Trends (Sketches) </b> | ||
+ | [[File:Cluster-Tags.jpg |center|400px]] | ||
+ | <b> Stack Overflow Trends (Proposed Layout)</b> | ||
+ | [[File:ClusterTags.jpg |center|400px]] | ||
+ | Source: http://bl.ocks.org/larskotthoff/4e5dbf8be2c83631a05b | ||
+ | | | ||
+ | * Create a shaded area to outline the different clusters | ||
+ | * By shading the area is easier to differentiate the different clusters | ||
+ | * Adding the filter you are able to choose which technologies to compare | ||
+ | * This graph could answer the questions like "How different technologies related to one another?" and "How different communities of technologies frequently used together?" | ||
+ | |- | ||
+ | | style="text-align: center;" | <b>Network Analysis (Sketches)</b> | ||
+ | [[File:NetworkTags.jpg|center|400px]] | ||
+ | <b>Network Analysis (Proposed Layout)</b> | ||
+ | [[File:NetworkCluster.jpg|center|400px]] | ||
+ | | | ||
+ | * At the bottom of the visualization, it shows a bar graph that represents the interaction between users over a period from the very first interaction to the current date (September 2017) | ||
+ | * A zoom in function to able to zoom into the nodes and edges | ||
+ | * Able to filter the different languages community | ||
+ | * Able to see who is which user answers the most question and which is the most connected in the community | ||
+ | * Measuring the degree centrality of each user based on the in-degree and out-degree edges, which represent interaction | ||
+ | * Measuring the betweenness central-ity of each user, which indicates the amount of information flow in the network that will pass through the user | ||
+ | |- | ||
+ | |} | ||
+ | <div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">ARCHITECTURE DIAGRAM</font></div> | ||
+ | <br> | ||
+ | [[File:Architecture diagram.jpg|center|900px]] | ||
+ | <br> | ||
<div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">KEY TECHNICAL CHALLENGES</font></div> | <div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">KEY TECHNICAL CHALLENGES</font></div> | ||
* Storing a extremely large network of StackOverflow posts, comments and users in a graph database and creating indexes for fast retrieval. | * Storing a extremely large network of StackOverflow posts, comments and users in a graph database and creating indexes for fast retrieval. | ||
* Using StackOverflow API to retrieve information on-demand and transform the data into insight on the visualization. | * Using StackOverflow API to retrieve information on-demand and transform the data into insight on the visualization. | ||
− | * Learning how to use the relevant tools for graph analysis such as neo4j | + | * Learning how to use the relevant tools for graph analysis such as neo4j, sigma.js and D3.js |
<div style="background: #f48024 ; margin-bottom:10px; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">PROJECT TIMELINE</font></div> | <div style="background: #f48024 ; margin-bottom:10px; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">PROJECT TIMELINE</font></div> | ||
− | [[File:Timeline. | + | [[File:AWS-Timeline.png|center|1000px]] |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
<div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">REFERENCES</font></div> | <div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">REFERENCES</font></div> | ||
http://sotagtrends.com | http://sotagtrends.com | ||
Line 179: | Line 227: | ||
https://insights.stackoverflow.com/survey/2017 | https://insights.stackoverflow.com/survey/2017 | ||
This report created by SO describe insights about SO users using data collected through their annual developer survey. Visualizations used here are simple but effective charts – bar charts, scatter plots and line charts – that describe intriguing phenomena such as preferred programming languages among ethnicities, representation of women in technology roles and correlated technologies used by developers. | This report created by SO describe insights about SO users using data collected through their annual developer survey. Visualizations used here are simple but effective charts – bar charts, scatter plots and line charts – that describe intriguing phenomena such as preferred programming languages among ethnicities, representation of women in technology roles and correlated technologies used by developers. | ||
+ | |||
+ | <div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:20px"><font color=#f48024 face="Century Gothic">COMMENTS</font></div> | ||
+ | Feel free to comments, suggestions and feedbacks to help us improve our project!:D |
Latest revision as of 16:26, 26 November 2017
StackOverflow is the largest, most trusted online community for developers to learn, share their programming knowledge, and build their careers. This is achieved through questions and answers posted on the site that concerns a wide range of topics related to computer programming. The success of such a platform depends on high quality questions and answers contributed by an engaged community. Thus, some measures to screen for quality includes using ‘badges’ to signify reputation for users, votes on post (upvote or downvote) and using community administrators to close questions that have been answered before or are irrelevant.
Our research is motivated by the lack of information on community interaction within Stackoverflow. As Information Sys-tems students who view the site frequently and rely heavily on it for assistance, we re-alised that we are surprisingly unaware of how interaction occurs on the site to keep the community active.
Most visualizations available show the growth and popularity of technologies, but not the underlying users driving these trends. We find it important to study user-to-user interaction as the community plays a very important role in providing useful content to the website.
the site to keep the community active.
Apart from the users, the use of multiple tags of technologies (e.g. javascript, jquery, react-native) which are placed on a question posted allow us to study how closely related different technologies are to one another. This is useful in understanding ways to in-tegrate separate technologies with one an-other, and to gauge the popularity of using these technologies together.
Some of the general problems (questions) that we wish to address are namely:
- How are the different technologies on Stack Overflow used together?
- What are the different clusters of technologies, frequently used together?
- Who is likely to answer my question?
- Determine which user is more influential in each community?
- Do top contributors answer actively and consistently over time?
- How does each technology grow over time?
Dataset/Source | Data Attributes |
---|---|
Badges
https://archive.org/download/stackexchange/stackoverflow.com-Badges.7z |
|
Comments
https://archive.org/download/stackexchange/stackoverflow.com-Comments.7z |
|
Posts
https://archive.org/download/stackexchange/stackoverflow.com-Posts.7z |
|
Postlinks
https://archive.org/download/stackexchange/stackoverflow.com-PostLinks.7z |
|
Users
https://archive.org/download/stackexchange/stackoverflow.com-Users.7z |
|
Votes
https://archive.org/download/stackexchange/stackoverflow.com-Votes.7z |
|
The data model best describes the user and application interactions on the Stack Overflow site. Building the data model allows us to decide what elements should be a node or a relationship.
Related Works | What We Can Learn |
---|---|
Stack Overflow Trends |
|
Stack Overflow Tag Network
Source: https://www.kaggle.com/juliasilge/network-graph/code |
|
Sketches | How Analyst Can Conduct Analysis |
---|---|
Stack Overflow Trends (Sketches)
Stack Overflow Trends (Proposed Layout) Source: http://bl.ocks.org/larskotthoff/4e5dbf8be2c83631a05b |
|
Network Analysis (Sketches)
Network Analysis (Proposed Layout) |
|
- Storing a extremely large network of StackOverflow posts, comments and users in a graph database and creating indexes for fast retrieval.
- Using StackOverflow API to retrieve information on-demand and transform the data into insight on the visualization.
- Learning how to use the relevant tools for graph analysis such as neo4j, sigma.js and D3.js
http://sotagtrends.com This web application plots the number of questions asked over time by tags on Stackoverflow. Examples of tags include “node.js”, “d3.js” and “python”. Multiple tags can be added by the user and the user can also toggle a relative comparison across the tags. Other interesting features this visualization has includes a moving reference line that slides along the X-axis, showing the exact number of questions asked for each tag.
https://insights.stackoverflow.com/survey/2017 This report created by SO describe insights about SO users using data collected through their annual developer survey. Visualizations used here are simple but effective charts – bar charts, scatter plots and line charts – that describe intriguing phenomena such as preferred programming languages among ethnicities, representation of women in technology roles and correlated technologies used by developers.
Feel free to comments, suggestions and feedbacks to help us improve our project!:D