IS428 AY2018-19T1 Chen Yuge

From Visual Analytics for Business Intelligence
Revision as of 18:35, 11 November 2018 by Yg.chen.2015 (talk | contribs) (Created page with " == Problem & Motivation == As one of the most polluted countries in Europe, Bulgaria is facing a high level of pollution. It is ranked eighth in the European Environment Agen...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Problem & Motivation

As one of the most polluted countries in Europe, Bulgaria is facing a high level of pollution. It is ranked eighth in the European Environment Agency’s 2017 report on air quality in Europe in terms of most premature deaths caused by PM2.5[1]. Our goal in this report is to find the pollution condition in the capital of Bulgaria– Sofia city.
In this report, I will use 3 Air Quality indicators—Official Air Quality Concentration, PM2.5 and PM10 to analyze the air quality in Sofia city. PM2.5 is a pollutant stemming from fuel combustion, heating, transportation, waste incineration, agriculture and other anthropogenic sources. According to studies, it is highly correlated with cancer rate, with a 36% increase in lung cancer per 10 μg/m3 as it can penetrate deeper into the lungs[2]. Worldwide exposure to PM2.5 contributed to 4.1 million deaths from heart disease and stroke, lung cancer, chronic lung disease, and respiratory infections. PM10 is particulate matter 10 micrometers or less in diameter, which has slight larger dimension than PM2.5 but same level of danger. By using the above-mentioned indicators, the goal is to visualize the air quality situation as well as the factors affecting the air quality. The factors which will be analyzed in this report are:

     -	Local energy sources
- Meteorology such as temperature, pressure, rainfall, humidity, wind etc
- Human Behavior such as driving, room heating
- Behaviors of Neighbors of Sofia city.



Data Analysis & Transformation

Data Analysis and Cleaning

Data Exploration

 Air Tube data
After doing a quick plotting of the raw data, I found that there are outliers (or misreading) in the Air Tube Dataset: o Temperature: remove temperature less than -20 and larger than 50

Cap.png