Difference between revisions of "ISSS608 2016-17T3 Group15 Proposal"
m (proj desc) |
m (add contents) |
||
Line 6: | Line 6: | ||
<font size = 5; color="#FFFFFF">Characterisation of Pandemic Spread</font> | <font size = 5; color="#FFFFFF">Characterisation of Pandemic Spread</font> | ||
</div> | </div> | ||
− | |||
{|style="background-color:#1B338F;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" | | {|style="background-color:#1B338F;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" | | ||
Line 45: | Line 44: | ||
==Project Description== | ==Project Description== | ||
− | This project is based on [http://hcil2.cs.umd.edu/newvarepository/VAST%20Challenge%202010/challenges/MC2%20-%20Characterization%20of%20Pandemic%20Spread/|'''VAST Challenge 2010 – Characterisation of Pandemic Spread'''], which involves the analysis of hospitalisation records relating to a major | + | This project is based on [http://hcil2.cs.umd.edu/newvarepository/VAST%20Challenge%202010/challenges/MC2%20-%20Characterization%20of%20Pandemic%20Spread/|'''VAST Challenge 2010 – Characterisation of Pandemic Spread'''], which involves the analysis of hospitalisation records relating to a major pandemic spread across the world in 2009. With the use of R, the project aims to develop a visualisation tool to analyse the illness across these countries, so as to help characterise the spread of the disease. |
==Background== | ==Background== | ||
− | + | There was a major disease outbreak that spanned several cities across the world in 2009. Such diseases tend to spread fast | |
− | + | and are fairly difficult to combat. Hence, health officials are seeking for visualisation tools to analyse the hospitalisation records across these countries, so as to help | |
+ | characterise the spread of the disease. | ||
+ | |||
+ | |||
+ | ==Data== | ||
+ | Note: The datasets used for this challenge are synthetic, with a blend of computer and human-generated data. No external data | ||
+ | is needed to perform the analysis as all information necessary to form working hypotheses are provided in the datasets. | ||
− | + | The datasets contained hospital admittance and death records for eleven cities involved in the pandemic, namely: | |
− | * | + | * Aleppo |
− | * | + | * Colombia |
+ | * Iran | ||
+ | * Karachi | ||
+ | * Lebanon | ||
+ | * Nairobi | ||
+ | * Saudi Arabia | ||
+ | * Thailand | ||
+ | * Turkey | ||
+ | * Venezuela | ||
+ | * Yemen | ||
==Ojectives== | ==Ojectives== | ||
+ | |||
+ | The project aims to develop a visualisation tool with R programming to perform the following analysis: | ||
+ | |||
+ | * Characterise the spread of the disease, taking into consideration symptoms of the disease, mortality rates, temporal patterns of the onset, peak and recovery of the disease. | ||
+ | * Compare the outbreak across cities, including the timing of outbreaks, numbers of people infected and recovery ability of the individual cities. | ||
+ | * Identify anomalies from the hospitalisation records, if any. | ||
+ | |||
+ | |||
+ | ==Approach== | ||
* [placeholder] | * [placeholder] | ||
Line 68: | Line 91: | ||
==Motivations== | ==Motivations== | ||
− | * | + | Through this project, we hope that the visualisation tool developed can help health officials analyse hospitalisation records for the next disease outbreak. With the help of visual analytics, the tool aims to save them analysis time so that they can react quickly to the pandemic spread. |
− | * | + | |
+ | This would be particularly relevant in recent years as there have been widespread disease outbreaks leading to an alarming number of cases in the affected regions. For example, the Ebola outbreak has led to major loss of lives (~11K), and there is growing evidence that the Zika outbreak would lead to high risk of birth defects and other neurological disorders. For both outbreaks, there have been significant socioeconomic disruptions for the affected regions. | ||
+ | |||
+ | * Ebola Outbreak (2014 - 2016) | ||
+ | * Zika Outbreak (2015 - 2016) | ||
==Challenges== | ==Challenges== | ||
− | * | + | How to create visualisations using R programming: |
− | * | + | * Steep learning curve to pick up R programming skillsets |
+ | |||
+ | |||
+ | Lack of domain knowledge in hospitalisation records: | ||
+ | * Longer time required for data exploration and cleaning | ||
+ | |||
+ | |||
+ | Lack of domain knowledge in epidemic and pandemic outbreaks: | ||
+ | * More background research is required in order to design more insightful visualisations | ||
+ | |||
Line 86: | Line 122: | ||
==References== | ==References== | ||
− | * [ | + | * [http://hcil2.cs.umd.edu/newvarepository/VAST%20Challenge%202010/challenges/MC2%20-%20Characterization%20of%20Pandemic%20Spread/|VAST Challenge 2010 – Characterisation of Pandemic Spread] |
− | * [ | + | * [https://www.cdc.gov/vhf/ebola/outbreaks/2014-west-africa/index.html|2014-2016 Ebola Outbreak in West Africa] |
+ | * [https://www.cdc.gov/zika/index.html|CDC Zika Virus] | ||
+ | * [http://r4ds.had.co.nz/|R for Data Science] | ||
+ | * [http://tidyverse.org/|Tidyverse R Packages] | ||
+ | * [http://shiny.rstudio.com/tutorial/|R Shiny Tutorial] |
Revision as of 03:00, 20 June 2017
ISSS608 Visual Analytics and Applications
Group 15 Project
Visualisation with R:
Characterisation of Pandemic Spread
|
|
|
|
Project Title: Characterisation of Pandemic Spread
Prepared by Group 15
Team Members:
- Chua Gim Hong
- Huang Liwei
- Ngo Siew Hui
Contents
Project Description
This project is based on VAST Challenge 2010 – Characterisation of Pandemic Spread, which involves the analysis of hospitalisation records relating to a major pandemic spread across the world in 2009. With the use of R, the project aims to develop a visualisation tool to analyse the illness across these countries, so as to help characterise the spread of the disease.
Background
There was a major disease outbreak that spanned several cities across the world in 2009. Such diseases tend to spread fast and are fairly difficult to combat. Hence, health officials are seeking for visualisation tools to analyse the hospitalisation records across these countries, so as to help characterise the spread of the disease.
Data
Note: The datasets used for this challenge are synthetic, with a blend of computer and human-generated data. No external data is needed to perform the analysis as all information necessary to form working hypotheses are provided in the datasets.
The datasets contained hospital admittance and death records for eleven cities involved in the pandemic, namely:
- Aleppo
- Colombia
- Iran
- Karachi
- Lebanon
- Nairobi
- Saudi Arabia
- Thailand
- Turkey
- Venezuela
- Yemen
Ojectives
The project aims to develop a visualisation tool with R programming to perform the following analysis:
- Characterise the spread of the disease, taking into consideration symptoms of the disease, mortality rates, temporal patterns of the onset, peak and recovery of the disease.
- Compare the outbreak across cities, including the timing of outbreaks, numbers of people infected and recovery ability of the individual cities.
- Identify anomalies from the hospitalisation records, if any.
Approach
- [placeholder]
- [placeholder]
Motivations
Through this project, we hope that the visualisation tool developed can help health officials analyse hospitalisation records for the next disease outbreak. With the help of visual analytics, the tool aims to save them analysis time so that they can react quickly to the pandemic spread.
This would be particularly relevant in recent years as there have been widespread disease outbreaks leading to an alarming number of cases in the affected regions. For example, the Ebola outbreak has led to major loss of lives (~11K), and there is growing evidence that the Zika outbreak would lead to high risk of birth defects and other neurological disorders. For both outbreaks, there have been significant socioeconomic disruptions for the affected regions.
- Ebola Outbreak (2014 - 2016)
- Zika Outbreak (2015 - 2016)
Challenges
How to create visualisations using R programming:
- Steep learning curve to pick up R programming skillsets
Lack of domain knowledge in hospitalisation records:
- Longer time required for data exploration and cleaning
Lack of domain knowledge in epidemic and pandemic outbreaks:
- More background research is required in order to design more insightful visualisations
Milestones & Expected Outcome
- [placeholder]
- [placeholder]