Difference between revisions of "Group03 proposal Version 2"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
Line 232: Line 232:
 
</center>
 
</center>
 
||  
 
||  
* This scatter plot show the median salary against the average years of coding experience for the different Development Type. The linear regression line aim to help users understand which developer jobs are higher paying and on average how many years of coding experience does developers in that job type have.  
+
* This scatter plot show the median salary against the average years of coding experience for the different Developer Type. The linear regression line aim to help users understand which developer jobs are higher paying and on average how many years of coding experience does developers in that job type have.  
 
* The filters on the right allow users to filter according to Country and Undergraduate Major for a more customizable exploration to gain more specific insights.  
 
* The filters on the right allow users to filter according to Country and Undergraduate Major for a more customizable exploration to gain more specific insights.  
 
*Tools/R Libraries used: ggplot2, plotly
 
*Tools/R Libraries used: ggplot2, plotly

Revision as of 01:44, 10 April 2020


Devbuzz.jpg



Team

 

Proposal

 

Poster

 

Application

 

Research Paper

Version 2


<--- Go Back to Project Groups

PROBLEM & MOTIVATION


Stack Overflow is arguably the biggest online community for developers all around the world. Each year, Stack Overflow field a survey with questions ranging from developers’ favorite technologies to their job preferences. This is done to allow Stack Overflow to better understand its active users. For 2019, nearly 90,000 developers participated in this 20-minute survey. Although the official Stack Overflow website provides a series of visualizations for every survey question, we found that there is a lack of comprehensive and interactive visualizations. After much deliberation, our team has narrowed down to three main aspects that we think are the most interesting and useful to the general users: Technology, Salary, and Job.

Our team aims to build user-friendly dashboards that highlight the most interesting aspects of the active developers in Stack Overflow. This will allow any interested users to get a quick overview of the essential results of this annual survey without the need to tediously scroll through a long list of static visualizations. Additionally, through various useful interactive filters, users can customize their explorations and gain more meaningful insights that suit their needs.

In order to help us expedite the building of these interactive visualizations, our team has decided to use R which gives us access to a wide variety of libraries and tools for data preprocessing and building user-friendly dashboards.

OBJECTIVE


For our project, we will be focusing on 3 main objectives. It is as follows:

  • Gain overall insights on developers demographics (the Stack Overflow community)
  • Get to know what are the most popular/relevant technologies(e.g. languages and platforms) used by the Stack OverFlow developers.
  • Gain insights on working hours, Job Satisfaction and job factors for Stack OverFlow developers.


SELECTED DATASET

We chose the StackOverflow Developer Survey 2019 dataset (at https://www.kaggle.com/mchirico/stack-overflow-developer-survey-results-2019), as StackOverflow is currently the largest online developer community. The dataset provided is freely accessible, and analysis of this dataset would provide a glimpse about the overall developer community.

The dataset contains 88,883 survey responses, with each row corresponding to one respondent, and each of the 85 different columns corresponding to the survey questions. Below is a quick summary about the data provided and their attributes, categorized by each of our 3 main objectives as mentioned above.

Data Attributes Data Provided
Background Likert
  • Extent of considering oneself as a stack overflow member
Numerical, Discrete
  • Age
Categorical
  • Gender
  • Country
  • Ethnicity
  • Profession
  • Education
  • Frequency and Purpose of using StackOverflow
Binary
  • Coding for hobby
  • Have dependents
Job prospects Likert
  • Job satisfaction
  • Job competence
Numerical, Continuous
  • Salary
Numerical, Discrete
  • Hours worked a week
  • Hours spent on code review
Categorical
  • Developer Type
  • Work structure, work challenges, working remotely
  • Code review
Skills Categorical
  • Programming languages, databases, platforms, and web frameworks
  • Developer tools used
  • Operating system used

BACKGROUND SURVEY


As mentioned previously, the official Stack Overflow website provides a series of visualizations for every survey question and we found that there is a lack of comprehensive and interactive visualizations. We have pick out a few visualizations that we can learn from and improved upon, which are shown and explained in the table below.

The link to the visualization from the official Stack Overflow website is https://insights.stackoverflow.com/survey/2019.

Reference of Other Visualization Learning Points
RespondentGeographic.png

This proportional symbol map shows the distribution of the number of survey respondents by country.

  • Pros:
    • This visualization is effective in showing where most respondents are from on the map.
    • Good use of color and opacity so that points that overlap each other can still be seen.
  • Cons:
    • A Choropleth Map would have provided a better visualization and easier comparison across countries.

There can be more charts plotted together with the map to provide a overview on the background of the survey respondents.

Technologies connected.png

This visualization shows which technologies are highly correlated with each other. This will provide a good over view on how the different technologies are clustered together in the ecosystems.

  • Pros:
    • This dashboard is pretty comprehensive and provide a good information to the users. For example, a user that just started learning python and would like to explore frameworks can find out which framework are commonly used with it.
  • Cons:
    • The chart is non interactive. The users might only be interested to find out how certain technologies type are correlated to each other instead of all of them.
Salary against experience.png

This scatter plot shows the median salary against average coding experience for the different developer type.

  • Pros:
    • The visualization shows a good comparison for the different developer type. User can easily compare which developer type of similar average years of coding have higher salary.
  • Cons:
    • This visualization is also static and does not have tooltip. This can make dots that are really small unclear, for example SRE. It is difficult to find out how many respondents are working as SRE developer.
Loved and dreaded.png

This bar chat rank the programming language that is loved, dreaded and desired to learn by the survey respondents.

  • Pros:
    • This is a good visualization to show which programming language are popular among developers whom uses stack overflow. This would be useful for users whom are interested to pick up programming language to learn.
  • Cons:
    • The three bar charts are displayed under different tabs, this can be difficult to compare across. It would be better if the charts can be in the same dashboard for easy comparison on which technologies are loved and dreaded.



PROPOSED STORYBOARD


Storyboard Insights / Comments

Title: OVERVIEW OF DEVELOPER DEMOGRAPHICS

Demographic Devbuzz.jpg
  • The first story aims to provide viewers with the overall demographic information about Stack Overflow developers
  • Top graph shows a interactive geographical map that allows user to choose which country to focus on.
  • Bottom graph is a age-gender pyramid that will change according to the country selected/Language selected. Default will show the demographic of the entire survey.
  • Right chart will show bar chart show the number of developers for each languages. It will change according to selected country selected as well.
    • Clicking on a dot or country on the map applies a filter that will update the other 2 charts based on the selected country
    • Hovering over a country on the map shows a tooltip that describes the number of developers. Same thing for the bar charts.
  • The filter on the right would allow users to filter the charts by programming language.
  • Tools/R Libraries used: ggplot2
  • Data fields: Country, Gender,

Title: ANALYSIS OF DEVELOPER'S TECHNOLOGY

Tech connect Devbuzz.jpg
  • This network chart not only allows users to see what are the technologies that Stack Overflow developers know, but also other related technologies.
  • Through the cluster and edges, users can see what are the common technologies that may be complementary to each other.
  • The filters on the right would allow users to explore the different technology types: Languages(Default), Platforms, Databases, Web Frameworks, MiscTech
  • Tools/R Libraries used: ggplot2
  • Data fields: LanguagesWorkedWith

Title: ANALYSIS OF DEVELOPER'S LOVED/NOT LOVED TECH

Tech love dreaded Devbuzz.jpg
  • This percentage bar chart will allow the users to know what are the technologies loved or not by the Stack Overflow respondents.
  • The filters on the right allow users to explore the different technology types. It will also allow users to filter according to the most dreaded or loved technologies.
  • Tools/R libraries used: Plotly
  • Data fields: LanguagesWorkedWith, LanguagedDesiredNextYear

Title: ANALYSIS OF DEVELOPER'S DESIRED TECH

Tech desire DevBuzz.jpg
  • A similar bar chart will be used to show what are the technologies that the Stack Overflow developers would like to learn next and the respective number of interested developers.
  • The filters on the right allow users to explore the different technology types. It will also allow users to filter according to the most dreaded or loved technologies.
  • Tools/R Libraries used: ggplot2
  • Data fields: LanguagesWorkedWith, LanguagedDesiredNextYear

Title: ANALYSIS OF DEVELOPER'S SALARY

Salary scatter DevBuzz.jpg
  • This scatter plot show the median salary against the average years of coding experience for the different Developer Type. The linear regression line aim to help users understand which developer jobs are higher paying and on average how many years of coding experience does developers in that job type have.
  • The filters on the right allow users to filter according to Country and Undergraduate Major for a more customizable exploration to gain more specific insights.
  • Tools/R Libraries used: ggplot2, plotly
  • Data fields: Devtype, Salary, YearsCode, YearsCodePro, UndergradMajor, Country

Title: ANALYSIS OF DEVELOPER'S JOB FACTORS

Job factor bar Devbuzz.jpg
  • This percentage of total bar chart will show what are the job factors that the Stack Overflow developers prioritize.
  • The filters on the right allow users to filter according to Gender and Country for a more customizable exploration to gain more specific insights.
  • Tools/R Libraries used: Plotly
  • Data fields: JobFactos

Title: ANALYSIS OF DEVELOPER'S JOB SATISFACTION

Job satisfaction bar Devbuzz.jpg
  • This divergent bar chart aims to show viewers which developers have higher job satisfaction based on the developer's job type.
  • This is done using a divergent stacked bar chart, which is good for comparing between different categories for Likert data. The count of records in each category is displayed on the right of the chart
    • Users can set the reference line for the divergent stacked bar chart to control how the middle point on the chart.
  • Tools/R Libraries used: ggplot2, likert
  • Data fields: DevType, JobSat, Gender, Country

Title: ANALYSIS OF DEVELOPER'S WORK HOURS

Job workinghours Devbuzz.jpg
  • This bar chart will show the mean working hours(weekly) according to the developers' occupation.
  • The filters on the right allow users to filter according to Gender and Country for a more customizable exploration to gain more specific insights.
  • Tools/R Libraries used: Plotly
  • Data fields: WorkWeekHrs, Gender, Country


TECHNICAL CHALLENGES


Challenges Mitigation Plan
  • Unfamiliarity with R, R Shiny and Tableau
  • Ask any seniors or friends who have taken any R-related courses to share their slides with us for references
  • Watch video tutorials from YouTube
  • Peer Learning
  • Unfamiliarity of data cleaning and transformation using R
  • Read online articles and forums for guidance
  • Watch video tutorials on how to fully utilise packages such as tidyr and dplyr
  • Trial and error

PROJECT TIMELINE

Photo 2020-03-01 18-02-12.jpg

COMMENTS


Feel free to leave us some comments on where we can improve!

No. Name Date Comments
1. Insert your name here Insert date here Insert comment here
2. Insert your name here Insert date here Insert comment here
3. Insert your name here Insert date here Insert comment here