<--- Go Back to Project Groups
PROBLEM & MOTIVATION
Stack Overflow is arguably the biggest online community for developers all around the world. Each year, Stack Overflow field a survey with questions ranging from developers’ favorite technologies to their job preferences. This is done to allow Stack Overflow to better understand its active users. For 2019, nearly 90,000 developers participated in this 20-minute survey. Although the official Stack Overflow website provides a series of visualizations for every survey question, we found that there is a lack of comprehensive and interactive visualizations. After much deliberation, our team has narrowed down to three main aspects that we think are the most interesting and useful to the general users: Technology, Salary, and Job.
Our team aims to build user-friendly dashboards that highlight the most interesting aspects of the active developers in Stack Overflow. This will allow any interested users to get a quick overview of the essential results of this annual survey without the need to tediously scroll through a long list of static visualizations. Additionally, through various useful interactive filters, users can customize their explorations and gain more meaningful insights that suit their needs.
In order to help us expedite the building of these interactive visualizations, our team has decided to use R which gives us access to a wide variety of libraries and tools for data preprocessing and building of user-friendly dashboards.
OBJECTIVE
For our project, we will be focusing on 3 main objectives. It is as follows:
Gain overall insights on developers demographics (the Stack Overflow community)
Gain insights on job prospects for Stack OverFlow developers and their work culture.
Understand the most popular/relevant programming languages, databases, frameworks, and platforms that are used by the Stack OverFlow developers.
SELECTED DATASET
We chose the StackOverflow Developer Survey 2019 dataset (at https://www.kaggle.com/mchirico/stack-overflow-developer-survey-results-2019), as StackOverflow is currently the largest online developer community. The dataset provided is freely accessible, and analysis of this dataset would provide a glimpse about the overall developer community.
The dataset contains 88,883 survey responses, with each row corresponding to one respondent, and each of the 85 different columns corresponding to the survey questions. Below is a quick summary about the data provided and their attributes, categorized by each of our 3 main objectives as mentioned above.
|
Data Attributes
|
Data Provided
|
Background
|
Likert
|
- Extent of considering oneself as a stack overflow member
|
Numerical, Discrete
|
|
Categorical
|
- Gender
- Ethnicity
- Profession
- Education
- Frequency and Purpose of using StackOverflow
|
Binary
|
- Coding for hobby
- Have dependents
|
Job prospects
|
Likert
|
- Job satisfaction
- Job competence
|
Numerical, Continuous
|
|
Numerical, Discrete
|
- Hours worked a week
- Hours spent on code review
|
Categorical
|
- Work structure, work challenges, working remotely
- Code review
|
Skills
|
Categorical
|
- Programming languages, databases, platforms, and web frameworks
- Developer tools used
- Operating system used
|
BACKGROUND SURVEY
Reference of Other Visualization
|
Learning Points
|
https://www.daxx.com/blog/development-trends/number-software-developers-world
|
This Choropleth map shows the distribution of the number of professional software developers in Europe by country
- Pros:
- It is very effective in showing the distribution of professional software developers in a glance. Darker shades represent a higher concentration of developers and vice versa.
- Cons:
- There is a lack of labeling on the map. For countries that have similar shading, it can become difficult for readers to differentiate the rankings for those countries.
|
https://www.wearedevelopers.com/business/developer-survey/#summary-download
|
This dashboard shows the demand for the top programming languages, frameworks and level experience.
- Pros:
- This dashboard is pretty comprehensive. Labels are also clear and concise. It is easy for readers to get the overview
- Cons:
- The bar colors used for the top 10 programming languages are not consistent. For the bottom 5 languages, it used a gradient of grey, while the top 5 used different colors. Gradient of the same color should be used for conveying quantity/intensity and not for differentiating between different categories.
|
https://insights.stackoverflow.com/survey/2019
|
This visualization shows the relation between Men/Women developers to Developer Role. The X-axis shows the ratio of men’s to women’s developers.
- Pros:
- The visualization shows a good correlation between Gender against Developer Role
- Cons:
- The X-axis can be confusing for the readers on what the 10x represents, clearer labels or axis title could have been used.
|
https://hired.com/blog/candidates/data-reveals-hottest-coding-languages/
|
This Proportional Symbol Map shows the hottest programing languages across different countries
- Pros:
- This is a good visualization for coders to view which languages are more relevant in the different countries.
- Cons:
- For readers with less geographical knowledge, it can be difficult to identify the exact countries
|
PROPOSED STORYBOARD
Storyboard
|
Insights / Comments
|
Title: OVERVIEW OF DEVELOPER DEMOGRAPHICS
|
- The first story aims to provide viewers with the overall demographic information about Stack Overflow developers
- Top graph shows a interactive geographical map that allows user to choose which country to focus on.
- Bottom graph is a age-gender pyramid that will change according to the country selected/Language selected. Default will show the demographic of the entire survey.
- Right chart will show bar chart show the numebr of developers for each languages. It will change according to selected country selected as well.
- Clicking on a dot or country on the map applies a filter that will update the other 2 charts based on the selected country
- Hovering over a country on the map shows a tooltip that describes the number of developers. Same thing for the bar charts.
- Tools/R Libraries used: ggplot2
- Data fields: Country, Gender,
|
Title: ANALYSIS OF DEVELOPER'S TECHNOLOGY
|
- This network chart not only allows users to see what are the technologies that Stack Overflow developers know, but also other related technologies.
- Through the cluster and edges, users can see what are the common technologies that may be complementary to each other.
The filters on the right would all users to explore the different technology types: Languages(Default), Platforms, Databases, Web Frameworks, MiscTech
- Tools/R Libraries used: ggplot2
- Data fields: LanguagesWorkedWith
|
Title: ANALYSIS OF DEVELOPER'S LOVED/NOT LOVED TECH
|
- This percentage bar chart will allow the users to know what are the technologies are loved or not loved by the Stack Overflow respondents.
- The filters on the right allow users to explore the different technology types. It will also allow users to filter according to the most dreaded or loved technologies.
- Tools/R libraries used: Plotly
- Data fields: LanguagesWorkedWith, LanguagedDesiredNextYear
|
Title: ANALYSIS OF DEVELOPER'S DESIRED TECH
|
- A similar bar chart will be used to show what are the technologies that the Stack Overflow developers would like to learn next and the respective number of interested developers.
- The filters on the right allow users to explore the different technology types. It will also allow users to filter according to the most dreaded or loved technologies.
- Tools/R Libraries used: ggplot2
- Data fields: LanguagesWorkedWith, LanguagedDesiredNextYear
|
Title: ANALYSIS OF DEVELOPER'S SALARY
|
- The last story aims to show viewers which developers have higher job satisfaction based on the selected category
- This is done using a divergent stacked bar chart, which is good for comparing between different categories for Likert data. The count of records in each category is displayed on the right of the chart
- There are many options for the categories, such as those mentioned in previous stories, as well as interesting ones such as whether the developer can work from home
- Users can select the sort order and the category from the menu on the left
- Users can also set the reference line for the divergent stacked bar chart to control how the chart is visualized.
- Tools/R Libraries used: ggplot2
- Data fields:
|
Title: ANALYSIS OF DEVELOPER'S JOB FACTORS
|
- This percentage of total bar chart will show what are the job factors that the Stack Overflow developers prioritize.
- The filters on the right allow users to filter according to Gender and Country for a more customizable exploration to gain more specific insights.
- Tools/R Libraries used: Plotly
- Data fields: JobFactos
|
Title: ANALYSIS OF DEVELOPER'S JOB SATISFACTION
|
- The last story aims to show viewers which developers have higher job satisfaction based on the selected category
- This is done using a divergent stacked bar chart, which is good for comparing between different categories for Likert data. The count of records in each category is displayed on the right of the chart
- There are many options for the categories, such as those mentioned in previous stories, as well as interesting ones such as whether the developer can work from home
- Users can select the sort order and the category from the menu on the left
- Users can also set the reference line for the divergent stacked bar chart to control how the chart is visualized.
- Tools/R Libraries used: ggplot2
- Data fields: JobSat, Gender, Country
|
Title: ANALYSIS OF DEVELOPER'S WORK HOURS
|
- This bar chart will show the mean working hours(weekly) according to the developers' occupation.
- The filters on the right allow users to filter according to Gender and Country for a more customizable exploration to gain more specific insights.
- Tools/R Libraries used: Plotly
- Data fields: WorkWeekHrs, Gender, Country
|
TECHNICAL CHALLENGES
Challenges
|
Mitigation Plan
|
- Unfamiliarity with R, R Shiny and Tableau
|
- Ask any seniors or friends who have taken any R-related courses to share their slides with us for references
- Watch video tutorials from YouTube
- Peer Learning
|
- Unfamiliarity of data cleaning and transformation using R
|
- Read online articles and forums for guidance
- Watch video tutorials on how to fully utilise packages such as tidyr and dplyr
- Trial and error
|
PROJECT TIMELINE
Feel free to leave us some comments on where we can improve!
No.
|
Name
|
Date
|
Comments
|
1.
|
Insert your name here
|
Insert date here
|
Insert comment here
|
2.
|
Insert your name here
|
Insert date here
|
Insert comment here
|
3.
|
Insert your name here
|
Insert date here
|
Insert comment here
|