Dangy Proposal
Project Objective |
Our project goal is to study the possible spreading pattern of dengue fever and offer potential countermeasures to contain its spread. Previous researches on dengue fever generally studied factors contributing to the breeding of Aedes mosquitoes or the disease’s outbreak. To achieve this, most researchers have delved into spatial analysis, utilizing models such as Geographical Weighted Regression (GWR), Moran I and Geary C statistics. To enhance their spatial analyses, some have also combined their studies with temporal analysis to identify the patterns of dengue fever’s outbreak. However, these researches conducted their temporal analysis at a large timeframe, which provided an overview of the distribution of dengue cases in a region and not how the disease spread. Though managing the disease's origin is important, it is also important for us to learn more about how to contain the spread of the disease when it occurs.
In the previous section, we mentioned briefly about Taiwan. Data about dengue fever in Taiwan is readily available for analysis. Also, considering how Taiwan has a good mix of different settlements and terrains, it is a good case study for us to conduct our research. To scope our research, we will be focusing on the major dengue outbreak that has happened in 2015.
Data preparation |
Data extracted directly from the various sources is mostly in CSV and GEOJSON format. One key challenge to data manipulation was the translation of chinese characters and also its accuracy.
Translation of Chinese Characters
We took the initial step to translate the JSON files directly with Google Translate. However, we found that this would alter the original structure of GEOJSON where there was missing parenthesis. Hence we took an alternative approach of using existing python library such as googletrans. Unfortunately, we encountered limitations such as character limit of 15,000.
We finalised with a safer approach using writing our own python script. We utilised selenium module to automate the process of inputting raw content directly into google translation engine and outputting them into proper JSON or CSV data structures.
Accuracy of Translation
Google translation engine does not offer translation for every word in our JSON data files. Our teams encounter a few words without translation after running the script. Hence manual translation is necessary.
Taiwan geographical data we sourced have slightly different county namings from the google translations we received. For example, Google Translation offers translation of “Taipei City” while Taiwan geographical data contains only “Taipei”. Hence further data transformation is required to standardise the county namings. Our team creates a dictionary to store words which involve translation discrepancies and replace the word using vlookup in Excel.
Project Prototype |