Group07 proposal

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
Team mers logo.png
Proposal   Poster   Application   Research Paper  


Project Motivation



Problem
HIV/AIDS, a chronic manageable disease, is a global pandemic that has created unprecedented challenges for physicians and health infrastructures. There is no cure for HIV yet. However, treatment can control HIV and enable people to live a long healthy life. Over the years, there have been substantial media coverage and resources to educate the public on the danger of HIV and how to avoid being infected with it. These resources can be found from organizations such as the Ministry of Health (MOH), Centers for Disease Control and Prevention (CDC)World Health Organisation (WHO). However, these websites tend to cluster descriptive critical information in their visualizations and in a table format which makes information difficult for the reader to grasp.


Motivation
Among all the deadly viruses, for this IS428 project, our group will focus our resources on the virus – Human Immunodeficiency Virus aka HIV in the United States (U.S.) context.

The reasons are as follow:

  • According to WHO, “Major UN study finds alarming lack of knowledge about HIV/AIDS among young people.”
  • According to Avert, “There is no cure for HIV, although antiretroviral treatment can control the virus.”
  • According to HIV gov, “HIV has cost America too much for too long and remains a significant public health issue and more than 700,000 American lives have been lost to HIV since 1981.”

Due to the absence of a cure, there is a higher priority to the insights of HIV as there is a possibility of this virus escalating into a global epidemic. Furthermore, the general public has a preconceived notion that they will less likely be infected by HIV which explains the little attention given to the virus. However, low risk does not equate to no risk. While this virus might not be a global epidemic yet, the situation is still severe in the U.S. and this issue should be addressed. This visualization project will also serve as an educational tool to educate the risk about HIV and complement the 2019 U.S. plan in ending the HIV Epidemic by 90% in 2030 with the help of data and visualization tools.

Objectives


With the comprehensive dataset available on Centers for Disease Control and Prevention, we will be focusing on the objectives below:

  • To explore geospatial distribution of HIV cases.
  • To gain insights on the trend of HIV cases and the fatality rate over the years.
  • To understand the treatment given to the HIV patients across different states over the years.
  • To gain insights on the demographics of the reported HIV patients.

Dataset


Below are the data sets that TEAM HIVA will be using for the visualization:

Dataset/Source Data Attributes Rationale Of Usage

Source: Centers for Disease Control and Prevention

Dataset: HIV diagnoses

  • Year: Year which patient was diagnosed with HIV
  • Geography: State of which the patient is from
  • Age Group: Patient’s age group
  • Race/Ethnicity: Patient’s race/ethnicity
  • Sex: Patient’s Gender
  • Transmission Category: Method of contraction of the disease
  • Cases: Number of new HIV diagnosis
  • CDC is a credible source of data
  • Datasets are comprehensive
  • Provides time-series data of HIV diagnoses for each region
  • Dataset allows the analysis of the diagnosis rate of HIV patients

Source: Centers for Disease Control and Prevention

Dataset: HIV deaths

  • Year: Year which patient was diagnosed with HIV
  • Geography: State of which the patient is from
  • Age Group: Patient’s age group
  • Race/Ethnicity: Patient’s race/ethnicity
  • Sex: Patient’s Gender
  • Transmission Category: Method of contraction of the disease
  • Cases: Number of HIV deaths
  • Dataset allows the analysis of the death rate of HIV patients

Source: Centers for Disease Control and Prevention

Dataset: Receipt of HIV Medical Care

  • Year: Year which patient was diagnosed with HIV
  • Geography: State of which the patient is from
  • Percent: Percentage of prevalent cases who received medical care
  • Dataset allows the analysis of the percentage of prevalent HIV cases who are receiving medical care
  • Receipt of care: Persons with ≥1 test (CD4 or VL)

Source: Centers for Disease Control and Prevention

Dataset: HIV Viral Suppression

  • Year: Year which patient was diagnosed with HIV
  • Geography: State of which the patient is from
  • Percent: Percentage of prevalent cases who have achieved viral suppression
  • Dataset allows the analysis of the percentage of prevalent HIV cases who have achieved viral suppression
  • Viral suppression: Persons with <200 copies/mL on their most recent VL test

Background Survey of Related Work


Background-dashboard.png

Learning Points Area of Improvement Solution
  • Individual graphs are intuitive so that the reader can capture the highlights of the visualization.
  • Appropriate usage of colours. Green usually represents positive information like survival while red represents danger. Both green and red are contrasting colours that exist on the opposite side of the colour wheel.
  • Presented background summary for the reader to better understand the context of the visualization. Summary was sorted for ease of visual intake.
  • The visualization adheres to the fundamental design principles of design - Balance, Proximity, Alignment, Repetition, Contrast and Space.
  • Very cluttered. The highlights section is already quite lengthy.
  • Have the highlights section be in key points and let the visualizations do the talking.


Background-1.png

Learning Points Area of Improvement Solution
  • The visualization depicts the number of contracted cases over the years from 2012-2019, displayed in weeks. Histogram chart is good for *capturing time series data.
  • Histogram is good for summarising a big dataset in visual format. It captures the distribution of the reported MERS-CoV cases in one glance.
  • Inconsistent time period of observation. Graph shows 2012-2019 weekly data while table is scoped down to 2019 year data.
  • Display 2019 weekly visualization with 2019 table. Alternatively, if the data is displayed in an interactive visualization, chart can be filtered yearly for clarity.
  • The Table in the graph is about the death rate, which is unrelated to the visualization. This is misleading as the visualization comprises mainly green and red colors as well. This gives the impression that the red portion of the visualization also represents death rate. However, this is not the case.
  • Remove the table from the visualization. Or remove the color coding from the table.
  • Ticks on the x-axis for weekly data are inconsistent.
  • Each year should have the same ticks.
  • Graph shows many different countries MERS-CoV Data, but as a viewer, due to the scale of the visualization, I am only able to see the colors of Saudi Arabia and UAE.
  • Declutter the visualization by either having the X-axis cover only 2019 weekly data or by using 2012-2019 Yearly data instead of weekly data.

Could possibly combine the other less occurring countries into one combined field.

  • This histogram visualization is inappropriate for comparing multiple categories. It is unclear whether each bar in the histogram is a stack-bar or overlapping bar.
  • An improvement could be visualizing the time series in line graph instead. However, this suggestion comes with its own drawbacks and visualization test has to be conducted.


Background-2.png

Learning Points Area of Improvement Solution
  • This is an age-sex pyramid displaying the number of patients who survived/died in each of the different age group categorized by the type of contraction.
  • Unsure of what is “n*1352”.
  • If 1352 refers to the number of cases used for this visualization, it should be mentioned clearly for the layman.


Background-3.png

Learning Points Area of Improvement Solution
  • Use of Prevalence rate. While we have prevalence data, it can be shown as a proportion of population.
  • No mention of year the visualization is for.
  • Tooltip shows the exact same information as the label which is the prevalence rate.
  • Tooltip can show the number of prevalent cases and the total population of the country. This eliminates the need for the bar graph on the right as well.


Background-4.png

Learning Points Area of Improvement Solution
  • Comparison between severity of HIV situation and medical budget is an interesting area which we can gain inspiration from.
  • It is difficult to make comparisons between the visualizations as map does not have labels for the countries.
  • Possibly put another geographic representation to replace the bubble chart. The color shading will represent national health spending per capita.


Background-5.png

Learning Points Area of Improvement Solution
  • Use of multiple visualizations to prove and support the same visualization. Distribution of HIV deaths among different countries.
  • Lack of links to other visualizations of different data to show interesting correlations.
  • Line graph at the bottom is difficult to see. Bad choice of colors.
  • Limit repetitive visualizations. If supporting visualizations is necessary, include information in hover tooltips.


Background-6.png

Learning Points Area of Improvement Solution
  • Good use of animation to present yearly change of the HIV prevalence in Africa.
  • Pressing the play button shows no change in the colors. This is because prevalence counts everyone with the HIV virus. Each year, new diagnosis and HIV deaths change prevalence slightly. But this is not enough to change the colors on the map.
  • Could possibly show a geographic representation of new diagnosis each year.

Reference List


Problem, motivation and objectives:


Dataset:


Background survey and research:

Technology Used


TeamHIV tech.png

Technical Challenges & Mitigation


Key Technical Challenge Proposed Solution
Unfamiliarity with R Shiny
  • Learning of visualisation tools
  • Code sharing and peer learning
  • Seek reference from Rshiny Community
  • Make use of the DataCamp resources given
Data cleaning and transformation
  • Work together to understand data
  • Work together for data exploration
  • Work together to clean and transform data
  • Pre-processing and prototyping using Tableau
Unfamiliar with implement interactive visualizations
  • Develop a Storyboard/Design Flow
  • Explore techniques online
  • Experiment with different techniques to find the best data visualization/ representation
Limited knowledge on HIV disease
  • Research and learn the situation of the HIV virus

Brainstorming


The following charts were proposed during our brainstorming session to fulfil the objectives of the project. The ideas were gathered after the background research on the related work from credible sources.

Chart Description
Brainstorming-1.png

From our background research, we learnt that the line chart will be more appropriate in showing the trend of the HIV cases over the years. It allows us to see the change overtime.

This will fulfil our objective: To gain insights on the HIV fatality rate over the years.

Brainstorming-2.png

We will be having another line chart to visualise the the number of HIV cases over the years.

It will be paired with 1 more filter - Race. This will allow the users to filter the number of HIV cases overtime for each race which will fulfil our objective on: To gain insights on the trend of HIV cases over the years and to gain insights on the demographics of the reported HIV patients.

Brainstorming-3.png

From our background research on related cases on WHO, we learnt that the age-sex pyramid is great for displaying the number of patients who survived/died in each of the different age group categorized by the type of contraction for the MER-CoV virus.

Therefore, we will be adopting this idea to fulfil our objective on: To gain insights on the demographics of the reported HIV patients.

This visualization will provide insights on the distribution of patients by age group and gender.

Brainstorming-4.png

A geo map of U.S. will allow us to fulfil the objective: Explore geospatial distribution of HIV cases.

This visualization will provide insights on the distribution of HIV across different states of the U.S.

During the brainstorming discussion, we discussed on the application of the animation feature we learnt in Dataviz makeover 6 to show interactive changes in HIV's geographical distribution over the years.

Each state will be highlighted with a color tone representative of the number/proportion of infected cases.

Brainstorming-5.png

A group bar chart will be utilized to portray the frequency distribution of treatment given to patients per age group. The reader can filter the information by year and states. This visualization resolves our project's problem by summarizing a large data set in a visual form, and providing clearer trends as compared to when using a table.

This will fulfill our objective on: To understand the treatments given to HIV patients across different states over the years and gain insights on the demographics trends of untreated patients.


Storyboard


Proposed Layout Description
Storyboard-1.png
Home

The Home page aims to fulfil the following goals: To create awareness for HIV by providing a general description of the problem/issues with regards to the virus today. Other information like the project motivation and objectives will be displayed on this page as well aid users in understanding the project context.

Users can begin their exploration of the project by either clicking on “Map Breakdown”, “Time Series” or “Treatments”. Each of these pages serve a different purpose.

Storyboard-2.png
Map Breakdown

The Map Breakdown module will comprise of the following:

1. Filter by Year and States

2. Choropleth map

  • A detailed U.S. map depicting the condition of HIV infected rate filtered by year(s) and state(s). The infected rate refers to the number of HIV cases.
  • Users will be able to infer the severity of the virus infected per state by looking at the colour tone. (Lighter tone being the less infected and darker tone being the more infected state).

3. Line chart

  • Upon clicking on a specific state(s), the death rate (in line chart) will be displayed in accordance to year(s).

4. Summary

  • This section will consist of a brief statement of the main points from the visualization.

5. Animation bar

  • The animation bar will be tagged to the choropleth map. Upon clicking on the play button, users will be able to see the changes in colour tone of the infected cases on the map across the year(s).
Storyboard-3.png
Time Series

The Time Series module will comprise of the following:

1. Filter by Race and Year

2. Line chart

  • Display the number of HIV cases throughout the year(s) by race(s).

3. Age-sex pyramid

  • Display the number of HIV cases within each age group by gender.

4. Summary

  • This section will consist of a brief statement of the main points from the visualisation.
Storyboard-4.png
Treatment

The Time Series module will comprise of the following:

1. Filter by State and Year

2. Group bar chart

  • This chart will allow the users to understand the types of treatments given to the patients by year(s) and state(s).
  • Patients either received treatment, no treatment or achieved viral suppression.

4. Summary

  • This section will consist of a brief statement of the main points from the visualization.

Project Timeline


Timeline-2.png

Comments


😛 Please feel free to leave your comments here! 😛

No. Name Date Comments
1. Name Date Comment
2. Name Date Comment
3. Name Date Comment