Difference between revisions of "IS428 AY2018-19T1 Le Van Tuan Long"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
Line 10: Line 10:
  
 
According to the WHO, 60 percent of the urban population in Bulgaria is exposed to dangerous (unhealthy) levels of particulate matter (PM10).
 
According to the WHO, 60 percent of the urban population in Bulgaria is exposed to dangerous (unhealthy) levels of particulate matter (PM10).
 
 
  
  
Line 157: Line 155:
 
# Looking at all the stations, station Druzhba is observed to have significantly lower PM10 level which indicates that the air quality at the area of Druzhba station is relatively better than other stations
 
# Looking at all the stations, station Druzhba is observed to have significantly lower PM10 level which indicates that the air quality at the area of Druzhba station is relatively better than other stations
 
# The other four stations are approximately the same throughout the 24 hours of the day
 
# The other four stations are approximately the same throughout the 24 hours of the day
 
</br>
 
 
 
 
 
 
 
<br/><div style="background: #2B547E; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px;letter-spacing:-0.08em;font-size:20px"><font color=#fbfcfd face="Arial">Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements </font></div> ==
 
  
  
 +
<br/><div style="background: #2B547E; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px;letter-spacing:-0.08em;font-size:20px"><font color=#fbfcfd face="Arial">Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements </font></div>
 
'''Questions:''' <hr>
 
'''Questions:''' <hr>
 
Characterize the sensors’ coverage, performance and operation. Are they well distributed over the entire city? Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?  
 
Characterize the sensors’ coverage, performance and operation. Are they well distributed over the entire city? Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?  
 
<br>
 
<br>
 
Now turn your attention to the air pollution measurements themselves. Which part of the city shows relatively higher readings than others? Are these differences time dependent?
 
Now turn your attention to the air pollution measurements themselves. Which part of the city shows relatively higher readings than others? Are these differences time dependent?
 +
 
</br>
 
</br>
  
<br>
 
 
'''Data Cleaning & Transformation:''' <hr>
 
'''Data Cleaning & Transformation:''' <hr>
To investigate the above problems, we need to clean the data and then visualise the dataset. The cleaning process involves:
+
* Obtaining the longitude and latitude from geo-hash coding
 
+
* Combining different files format into one file
* Combining all the datasets across the years into one excel file
+
* Filter to only get data from Sofia City and remove other cities data using Tableau Lasso function
* Linking the dataset with the metadata to retrieve the longitude and latitude of the air quality station
 
* Filter to only get data from Sofia City.
 
 
 
<br>
 
 
 
[[File:GroupingSofiaCity.png|300px|center]]
 
 
 
The filtering process to only show data from Sofia City was done through the use of the Lasso tool in Tableau to group the geohashes for Sofia City. Afterwards, I used the group created to only show values for Sofia city and removed the records for neighboring cities.
 
  
 
</br>
 
</br>
 
  
 
'''Dashboard 1:''' <hr>
 
'''Dashboard 1:''' <hr>
Line 211: Line 192:
 
''Description:''
 
''Description:''
  
This visualization is a symbol map that shows the distribution of sensors all over Sofia City and the number of records measured by each sensor. The circles denote a presence of a sensor in that part of the city, and the size of the circle denotes how many measurements were recorded by the sensor.
+
From the visualisation above, we can see the distribution of sensors across Sofia city. Each circle is a sensor and the size of the circle is the number of readings recorded by the sensor
  
The user would be able to:
+
This allows user to:
* Locate at which part(s) of Sofia City are most sensors located
+
* Determine the location and distribution of sensors
* Compare the number of measurements recorded by each of the sensors in Sofia City
+
* Determine which sensors being able to record the highest number of readings
  
 
This would allow us to answer the question of what is the coverage of the sensors in Sofia City and how well each is operating.
 
This would allow us to answer the question of what is the coverage of the sensors in Sofia City and how well each is operating.
Line 223: Line 204:
  
 
From the visualisation, we have the following insights:
 
From the visualisation, we have the following insights:
# Most sensors are located at the center of Sofia City
+
# Most sensors are located at the central of Sofia City
# There are a few sensors located at the edges of Sofia City
+
# From the visualisation, by zooming into the map, we can see that there are some sensors with low number of readings which indicates that they might not be functioning properly. Near the sensors with very little data, we can see that there are other sensors nearby with a lot more readings. This might indicate that we should check the sensors with low readings because it might have malfunctioned
# Zooming in further, we can see that more than half of the sensors have recorded a lot of data, and relatively the same amounts of data. However, there are also a lot of sensors that did not record that many data.  
 
# It is interesting to note that the sensors that only have a few records of data are at the same place as those that recorded a lot of data. It can be speculated that the sensors with few data have malfunctioned or broken down and were replaced.
 
  
 
</br>
 
</br>
Line 237: Line 216:
 
''Description:''
 
''Description:''
  
This visualization shows three line graphs that shows the average pressure, temperature and humidity through time, 2017 to 2018, in Sofia City.
+
This plots average pressure, temperature and humidity through time, 2017 to 2018, in Sofia City.
  
The user would be able to:
+
This allows users to:
* Compare the average measurements for pressure, temperature and humidty through time.
+
* Compare the average measurements for pressure, temperature and h tumidity throughout the years
  
This would allow us to answer the question of what is the performance of the sensors in Sofia City and how well each is operating.
+
This would allow us spot on which day(s) did the sensor fail
  
 
</br>
 
</br>
 
''Insights:''
 
''Insights:''
From the visualisation, we have the following insights:
+
From the visualisation, we would be able to observe suspicious drop of values for pressure which allows us to identified failed sensors:
# For the month of February, the sensors have failed to take measurements for temperature, pressure and humidity for one day.
+
* We found out that on March 30th, the Pressure sensors have failed. Looking into the data:
# The sensors failed again to take measurements for pressure from March 30 to April 1. However, it's unexpected because it was able to take measurements for temperature and humidity on the same day.
+
[[File:Sadsdasd.png|500px|thumb|center]]
# Filtering the visualization by month, we can see a trend where the average temperature and average humidity always go in opposite directions. When average temperature rise, average humidity drops and vice versa.
+
* The sensors failed again from March 31st 2AM to April 1st 6PM. Looking deeper into the data we confirm this:
 +
[[File:Image 2 sensor.png|500px|thumb|center]]
 +
[[File:Image 3 sensor.png|500px|thumb|center]]
  
 
</br>
 
</br>
Line 284: Line 265:
 
# The most polluted area is the central of Sofia city
 
# The most polluted area is the central of Sofia city
 
# P1 and P2 concentrations appear to correlate with one another
 
# P1 and P2 concentrations appear to correlate with one another
 +
 +
</br>
  
 
'''Bottom 2 visualisations in the above dashboard'''  
 
'''Bottom 2 visualisations in the above dashboard'''  
Line 304: Line 287:
 
# We can conclude that the P1 and P2 concentrations are actually dependent on time, having high concentrations across all stations on the 8th and 26th
 
# We can conclude that the P1 and P2 concentrations are actually dependent on time, having high concentrations across all stations on the 8th and 26th
 
# The trend is more obvious for P1 concentration
 
# The trend is more obvious for P1 concentration
 
 
 
 
 
  
  

Revision as of 01:00, 12 November 2018

To be a Visual Detective


Overview

Air pollution is an important risk factor for health in Europe and worldwide. A recent review of the global burden of disease showed that it is one of the top ten risk factors for health globally. Worldwide an estimated 7 million people died prematurely because of pollution; in the European Union (EU) 400,000 people suffer a premature death. The Organisation for Economic Cooperation and Development (OECD) predicts that in 2050 outdoor air pollution will be the top cause of environmentally related deaths worldwide. In addition, air pollution has also been classified as the leading environmental cause of cancer.

Air quality in Bulgaria is a big concern: measurements show that citizens all over the country breathe in air that is considered harmful to health. For example, concentrations of PM2.5 and PM10 are much higher than what the EU and the World Health Organization (WHO) have set to protect health.

Bulgaria had the highest PM2.5 concentrations of all EU-28 member states in urban areas over a three-year average. For PM10, Bulgaria is also leading on the top polluted countries with 77 μg/m3on the daily mean concentration (EU limit value is 50 μg/m3).

According to the WHO, 60 percent of the urban population in Bulgaria is exposed to dangerous (unhealthy) levels of particulate matter (PM10).



Task 1: Spatio-temporal Analysis of Official Air Quality

Questions:


Characterize the past and most recent situation with respect to air quality measures in Sofia City. What does a typical day look like for Sofia city? Do you see any trends of possible interest in this investigation? What anomalies do you find in the official air quality dataset? How do these affect your analysis of potential problems to the environment?


Data Cleaning & Transformation:


To investigate the above problems, we need to clean the data and then visualise the dataset. The cleaning process involves:

  • Combining all the datasets across the years into one excel file
  • Linking the dataset with the metadata to retrieve the longitude and latitude of the air quality station


Dashboard 1:


Tableau Public Link: [[1]]

This dashboard uses data from 2013 to 2017 which contains the daily readings of the air quality at different stations

This dashboard allows the users to:

  1. 1. See the fluctuations in the air pollutants daily, and
  2. 2. Drill-down and filter the data by Year and by Air Quality Station
Q1d1 v2 - long.png


Visualisation 1:
Image:

Q1d1v1 v2 - long.png


Description:

This visualisation shows Sofia city map and the location of the air quality station.

The colour and the size indicate the average pollutants level of that station. The user would be able to select the station to filter visualisation 2 and 3 below.


Insights:

The station with the worst air quality is station 73A as compared to station 52A with the lowest average pollutant readings


Visualisation 2:
Image:

Q1d1v2 v2 - long.png


Description:

This visualisation shows the monthly average air quality of Sofia city

The user would be able to filter by year and observe the trends on how air quality changes over the months in the selected year


Insights:

From the visualisation, we can observe the trend that air quality is the worst at the start and at the end of the year from the month of December to January


Visualisation 3:
Image:

Q1d1v3 v2 - long.png


Description:

This visualisation shows the daily average air quality of Sofia city

The user would be able to filter by year and observe the trends on how air quality changes over the the days in a selected year


Insights:

From this visualisation we can see that the daily average fluctuates greatly daily. However, the air quality clearly shoots up at the end and the start of the year from December to January


Dashboard 2:


Tableau Public Link: [[2]]

This dashboard uses data in 2018 which contains the hourly readings of air quality

This dashboard allows the users to:

  1. See the fluctuations in the air pollutants hourly and daily, and
  2. Compare the air pollutant readings across different air quality stations


Dashboard 2.2 - long.png


Visualisation 1:
Image:

Q1d2v1 - long.png


Description:

This visualisation shows the heat-map of the average hourly pollutant concentration in 2018 with the x-axis being the hour of the day and the y-axis being the day of the month

The user would be able to:

  • Compare the pollutant concentration across the days in a month
  • Compare the pollutant concentration across the hours in a day

This would allow us to answer the question of how is the air quality like in a typical day of Sofia city


Insights:

From the visualisation, we have the following conclusions:

  1. For some days (e.g. day 14), the air quality is consistently low throughout relative to other days
  2. Some other days (e.g. day 26, 27), the air quality is consistently high throughout
  3. Some days (e.g. day 16, 18), the air quality fluctuates greatly throughout the days with either morning or evening having worse air quality than the other



Visualisation 2:
Image:

Q1d2v2 - long.png


Description:

This visualisation shows the line graph of hourly average pollutant concentration for each of the stations in 2018

The user would be able to:

  • Compare the hourly average pollutant concentration across different stations

This would allow us to answer the question of how is the air quality like in a typical day of Sofia city taking into account of the area that the station is situated


Insights:

From the visualisation, we have the following conclusions:

  1. Looking at all the stations, station Druzhba is observed to have significantly lower PM10 level which indicates that the air quality at the area of Druzhba station is relatively better than other stations
  2. The other four stations are approximately the same throughout the 24 hours of the day



Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements

Questions:


Characterize the sensors’ coverage, performance and operation. Are they well distributed over the entire city? Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?
Now turn your attention to the air pollution measurements themselves. Which part of the city shows relatively higher readings than others? Are these differences time dependent?


Data Cleaning & Transformation:


  • Obtaining the longitude and latitude from geo-hash coding
  • Combining different files format into one file
  • Filter to only get data from Sofia City and remove other cities data using Tableau Lasso function


Dashboard 1:


Tableau Public Link: [[3]]

Screenshot 2018-11-12 at 12.12.54 AM.png

This dashboard uses data from 2017 to 2018 to show the P1 and P2 measurements for the various parts of Sofia City in Bulgaria.

This dashboard allows the users to:

  1. Locate the sensors position and see the distribution of the sensors
  2. See the amount of data collected for each of the sensor
  3. See how the sensors are performing by looking at their average pressure, temperature and humidity readings


Visualisation 1:

Screenshot 2018-11-12 at 12.15.16 AM.png


Description:

From the visualisation above, we can see the distribution of sensors across Sofia city. Each circle is a sensor and the size of the circle is the number of readings recorded by the sensor

This allows user to:

  • Determine the location and distribution of sensors
  • Determine which sensors being able to record the highest number of readings

This would allow us to answer the question of what is the coverage of the sensors in Sofia City and how well each is operating.


Insights:

From the visualisation, we have the following insights:

  1. Most sensors are located at the central of Sofia City
  2. From the visualisation, by zooming into the map, we can see that there are some sensors with low number of readings which indicates that they might not be functioning properly. Near the sensors with very little data, we can see that there are other sensors nearby with a lot more readings. This might indicate that we should check the sensors with low readings because it might have malfunctioned


Visualisation 2:
Image:

Screenshot 2018-11-12 at 12.16.17 AM.png


Description:

This plots average pressure, temperature and humidity through time, 2017 to 2018, in Sofia City.

This allows users to:

  • Compare the average measurements for pressure, temperature and h tumidity throughout the years

This would allow us spot on which day(s) did the sensor fail


Insights: From the visualisation, we would be able to observe suspicious drop of values for pressure which allows us to identified failed sensors:

  • We found out that on March 30th, the Pressure sensors have failed. Looking into the data:
Sadsdasd.png
  • The sensors failed again from March 31st 2AM to April 1st 6PM. Looking deeper into the data we confirm this:
Image 2 sensor.png
Image 3 sensor.png


Dashboard 2:

Tableau Public Link: [[4]]

Screenshot 2018-11-12 at 12.27.30 AM.png

This dashboard uses data from 2017 to 2018 to show the P1 and P2 measurements for the various parts of Sofia City in Bulgaria.

This dashboard allows the users to:

  1. See the pollutant concentrations in the different parts of Sofia City
  2. See how the pollutant concentrations in the different parts of Sofia City change with respect to the day of the month


Top 2 visualisations in the above dashboard
Description:

This visualisation used here is a symbol map with P1 on the left and P2 on the right. The color and size indicates the average pollutants concentration readings of the station

The user would be able to:

  • Finding out which areas of Sofia city with high pollutants concentration P1 and P2

This answers the question which parts of Sofia city have high concentrations of P1.


Insights:

From the visualisation, we have the following insights:

  1. Concentrations of P1 and P2 are around 50
  2. The most polluted area is the central of Sofia city
  3. P1 and P2 concentrations appear to correlate with one another


Bottom 2 visualisations in the above dashboard
Description:

These are heatmaps of each of the sensor in Sofia city and how their P1 and P2 concentration across the days of the month.

The user would be able to:

  • Determine relationship between the differences in P1 and P2 concentration in the parts of Sofia city and time.
  • Determine whether the concentration is dependent on time
  • Determine which day(s) of the month have the highest P1 concentration and which have the lowest

This would answer whether P1 and P2 concentrations in Sofia City are time dependent.


Insights: From the visualisation, we have the following insights:

  1. We can conclude that the P1 and P2 concentrations are actually dependent on time, having high concentrations across all stations on the 8th and 26th
  2. The trend is more obvious for P1 concentration



Task 3: Air Quality Measure Analysis

Questions:


Urban air pollution is a complex issue. There are many factors affecting the air quality of a city. Some of the possible causes are:

  1. Local energy sources. For example, according to Unmask My City, a global initiative by doctors, nurses, public health practitioners, and allied health professionals dedicated to improving air quality and reducing emissions in our cities, Bulgaria’s main sources of PM10, and fine particle pollution PM2.5 (particles 2.5 microns or smaller) are household burning of fossil fuels or biomass, and transport.
  2. Local meteorology such as temperature, pressure, rainfall, humidity, wind etc
  3. Local topography
  4. Complex interactions between local topography and meteorological characteristics.
  5. Transboundary pollution for example the haze that intruded into Singapore from our neighbours.

In this third task, you are required to reveal the relationships between the factors mentioned above and the air quality measure detected in Task 1 and Task 2. Limit your response to no more than 5 images and 600 words.

Data Cleaning & Transformation:


Refer to the cleaning and preparation of task 2. For task 3 we are using the same dataset.

In order to study the topology of the area, we downloaded and topology map of the country. Credits to the topology map found here: [[5]]

Dashboard 1:


Tableau Public Link: [[6]]

This dashboard aims to show the relationship between the pollutants level and the meteorology factors such as: pressure, humidity and temperature.

The dataset can be filtered for year 2017 and year 2018

  • YEAR 2017
Q3d1a - long.png


  • YEAR 2018
Q3d1b - long.png



Visualisation 1:
Image:

  • 2017:
Q3d1v1 2017 - long.png
  • 2018:
Q3d1v1 2018 - long.png


Description: I have created the dual plot for daily average pressure and daily average P1 and daily average P2

With this chart user can see the trends across the days for pressure and P1, any correlation and pattern between the two would be identified by looking at the chart


Visualisation 2:
Image:

  • 2017:
Q3d1v2 2017 - long.png
  • 2018:
Q3d1v2 2018 - long.png


Description:

I have created the dual plot for daily humidity and daily average P1 and daily average P2

With this chart user can see the trends across the days for humidity level and P1, any correlation and pattern between the two would be identified by looking at the chart


Visualisation 3:
Image:

  • 2017:
Q3d1v3 2017 - long.png
  • 2018:
Q3d1v3 2018 - long.png


Description:

I have created the dual plot for temperature and daily average P1 and daily average P2

With this chart user can see the trends across the days for temperature and P1, any correlation and pattern between the two would be identified by looking at the chart


Combined Insights for 3 visualisations:

From the visualisations above, we have the following conclusions: In 2017:

  • We can see that there is an upward trends for both pressure and humidity with the pollutants level P1 and P2
  • For temperature, there is a reversed trend with the pollutants level P1 and P2


In 2018:

  • We can see that there is a downward trends for both pressure and humidity with the pollutants level P1 and P2
  • For temperature, again, we see a reversed trend with the pollutants level P1 and P2

We can conclude that the pollutants level positively correlates with pressure and humidity while negatively correlates with temperature




Dashboard 2:


Tableau Public Link: [[7]]

Topo.jpg


Description:

There is only one visualisation here for this dashboard whereby we look at the relationship between the topology and the pollutants level readings from the stations


Insights

From the visualisation, we can see that at the bottom left corner Sofia city, there is higher elevation indicated by the green pasture and the contour. Looking at the readings of the stations around that area, it appears that the levels of pollutants are much lower as compared to the rest of the city.

At the far up right corner of the city, the stations give much worse readings as indicated by the dark red colour of the circle.

Hence, we can conclude that there might be a correlation between the elevation of the ground and the pollutants level. It might be that near the mountain, there is less pollution



Conclusion

Overall, we can conclude that there are many factors affecting the pollutants level. And by visualising multiple data, we would be able to see more clearly on what factors play a more significant role in affecting the pollutants' level


Reference