Difference between revisions of "IS428 AY2018-19T1 Zhuo Yunying"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
 
(25 intermediate revisions by the same user not shown)
Line 2: Line 2:
 
<p>Air pollution is an important risk factor for health in Europe and worldwide. A recent review of the global burden of disease showed that it is one of the top ten risk factors for health globally. Worldwide an estimated 7 million people died prematurely because of pollution; in the European Union (EU) 400,000 people suffer a premature death. The Organisation for Economic Cooperation and Development (OECD) predicts that in 2050 outdoor air pollution will be the top cause of environmentally related deaths worldwide. In addition, air pollution has also been classified as the leading environmental cause of cancer.
 
<p>Air pollution is an important risk factor for health in Europe and worldwide. A recent review of the global burden of disease showed that it is one of the top ten risk factors for health globally. Worldwide an estimated 7 million people died prematurely because of pollution; in the European Union (EU) 400,000 people suffer a premature death. The Organisation for Economic Cooperation and Development (OECD) predicts that in 2050 outdoor air pollution will be the top cause of environmentally related deaths worldwide. In addition, air pollution has also been classified as the leading environmental cause of cancer.
  
Air quality in Bulgaria is a big concern: measurements show that citizens all over the country breathe in air that is considered harmful to health. For example, concentrations of PM2.5 and PM10 are much higher than what the EU and the World Health Organization (WHO) have set to protect health. Bulgaria had the highest PM2.5 concentrations of all EU-28 member states in urban areas over a three-year average. For PM10, Bulgaria is also leading on the top polluted countries with 77 μg/m3on the daily mean concentration (EU limit value is 50 μg/m3).According to the WHO, 60 percent of the urban population in Bulgaria is exposed to dangerous (unhealthy) levels of particulate matter (PM10). </p>
+
Air quality in Bulgaria is a big concern: measurements show that citizens all over the country breathe in air that is considered harmful to health. For example, concentrations of PM2.5 and PM10 are much higher than what the EU and the World Health Organization (WHO) have set to protect health. Bulgaria had the highest PM2.5 concentrations of all EU-28 member states in urban areas over a three-year average. For PM10, Bulgaria is also leading on the top polluted countries with 77 μg/m3on the daily mean concentration (EU limit value is 50 μg/m3). According to the WHO, 60 percent of the urban population in Bulgaria is exposed to dangerous (unhealthy) levels of particulate matter (PM10). </p>
  
 
<p>This assignment aims to study the following: </p>
 
<p>This assignment aims to study the following: </p>
Line 10: Line 10:
  
 
== Dataset Analysis & Transformation Process ==
 
== Dataset Analysis & Transformation Process ==
Before the analysis began, the dataset given is analysed to identify its respective format and attributes. There were 6 different zip files provided in the assignment and each has its own unique ways to process and make sense of the data to bring value to the analysis. This section will elaborate on the dataset analysis and transformation process for each dataset in order to prepare the data for import and analysis on an interactive visualization.
+
===Task 1: Spatio-temporal Analysis of Official Air Quality===
 +
===1. Combine all time-series data (e.g. BG_5_9572_2017_timeseries) into one single excel spreadsheet===
 +
*Create a new "Station" Column (indicating station code i.e. 9421) and "Year" Column based on the year for each of the time-series data
 +
*Use excel to combine the rest of time-series files based on all the common columns (e.g. Countrycode, Namespace, AirQualityNetwork and etc.)
 +
*Based on the analysis of the existing combined time-series data from 2013 to 2018, there is a drastic difference in the level of aggregation across the years. As shown in the table below, Year 2016 has a combination of hourly air quality readings and daily air quality readings while in 2017, there is a mixture of hour and var readings as for certain days, readings are not measured at one-hour interval continuously. As such, the analysis on air quality readings will be based on "Day" averaging time from year 2013 to 2016 as it is impossible to lower the aggregation level of "Day" to "Hour" readings in 2016. On the other hand, the analysis on air quality readings will be based on "Hour" averaging time for 2017 and 2018. For the data in 2017,average readings will be taken if the readings for any specific days are done on "Var" basis. Hourly analysis is also more accurate for both 2017 and 2018 as there are missing data on specific months. (In 2017, only Nov and Dec data are available while in 2018, only Jan to Sep's data are available)
 +
[[File:Task1DataTransformation a.png|300px|center]]
 +
*Due to the standardization in averaging time, the values under "Concentration", "DatetimeBegin" and "DatetimeEnd" have been adjusted accordingly while other column values remain unchanged.
 +
*There are quite a number of duplicated readings in the dataset. These duplicated readings are removed during the transformation process to avoid unequal weightage.
 +
*Due to the high variation in raw data and small dataset, excel is used for the transformation process.
 +
 +
===2. Merge metadata file with combined time-series data ===
 +
*According to the source of scrapped data (http://discomap.eea.europa.eu/map/fme/AirQualityExport.htm),"the join between time-series files and the metadata file should be made using the Countrycode (or Namespace) and SamplingPoint". Thus, metadata file and time-series data time are merged via Vlookup function in excel based on SamplingPoint (since Countrycode are all "BA" for both data file).
 +
*Upon further inspection, these two datasets have a number of common columns with same values. These columns include "Countrycode, Namespace, AirQualityNetwork, AirQualityStation, AirQualityStationEoICode, SamplingProcess, AirPollutantCode, AirPollutant" and thus repeated columns are removed.  
  
=== Fixed/Mobile Proximity Data & Employee List ===
+
===Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements===
<p>There are 2 main types of proximity detectors installed in the building to enhance the safety and security of its employees. These detectors include fixed proximity sensors installed in each zone of the building and a mobile proximity sensor that is attached to Rosie, a robotic mail delivery system. Data will be captured when employees, holding their own proximity card, pass a proximity sensor in the building.</p>
+
*Geohash represents station locations. However, Tableau is not able to interpret geohash as geographic data. Before Air Tube data is imported to Tableau for analysis, geohash needs to be decoded into geographical coordinates. Due to the sheer size in Air Tube datasets(data_bg_2017.xlsx and data_bg_2018.xlsx), R packages ("devtools", "tidyverse" and "geohas") as indicated by Prof Kam will be used to transform the geohashed raw data to a new csv file containing the coordinates of the locations.
 +
*Next, 2017 and 2018 datasets that have been transformed will be combined based on the R programming code below.
 +
[[File:Task2Transformation.png|500px|center]]
 +
*The combined datasets comprises of readings across the whole Bulgaria area which is beyond the scope of our concern (i.e. Sofia). To maintain the consistency in comparison, Inclusion filter function is used to filter in only longitudes and latitudes that are in Sofia city. Due to the difficulty in comparing with the real geographical area of Sofia, an existing map on Sofia City (from Open Street Map) was used to filter through the Tableau filter function. The folowing shows map of Sofia and the filtering function.
 +
[[File:Sofia Map.png|500px|center]]
 +
[[File:Tableau Filter.png|300px|center]]
  
<p>The formats of the captured data by both the fixed proximity sensor and mobile proximity sensor are similar. However, the fixed proximity sensor provides the zone in which employees pass through while the mobile proximity sensor provides the exact coordinates of the employee’s location. The following shows the main differences between the two different types of data captured:</p>
+
===Task 3: Find out the relationship of the air quality analysis with other factors===
[[File:Fixed Mobile Original Data Formats GwendolineTanWanXin.png|600px|center]]
+
*The Meteorological data scrapped is currently in crosstab format which is not suitable for analysis on Tableau. Hence, the dataset has to be transformed to columnar format. Hence, pivoting will be done to transform the data.  
<p>The following section illustrates the issues faced in the data analysis phase leading to a need to transform the data into a specified format.</p>
+
*As the data includes average, minimum and maximum readings for different types of measurements and are presented on separate columns. The format is not suitable to be processed on Tableau if there is a need to introduce filters in the dashboard. Hence the dataset is transformed to as shown below.  
<p>'''Issue:''' Because of the differences in the types of data captured, the data cannot be correlated with each other for display in the same chart. For example, one can plot employee’s exact location using the X/Y coordinates (from the mobile proximity sensor) but the zone coordinates (from the fixed proximity sensor) were not known and hence, could not be plotted. This will present an issue during the analysis since we are unable to clearly identify which area the employees are present on the map.</p>
+
[[File:Task3Transformation.png|600px|center]]
<p>'''Solution:''' To resolve this issue, the coordinates for each zone has to be identified. This is done using an online tool that allow analysts to get coordinates for custom polygons on a map. By plotting these custom polygons on a map, we will be able to get the centre point coordinates of the zone. This will then allow us to plot the employees’ location on each zone. The following details the process of getting individual zone coordinates.</p>
 
# Download the online tool from the link: https://github.com/bryantbhowell/tableau-map-pack/blob/master/draw_tableau_polygons_on_background_image.html
 
# Open the tool and choose the map to be plotted. In the zone maps given, there were white spaces and additional text that were not part of the map itself. As such, the map has to be cropped to get the actual map itself only. The following shows the cropping of maps and this process is repeated for each of the 3 levels. <p>[[File:Cropping of Maps GwendolineTanWanXin.png|500px|center]]</p>
 
# Open the map using the online tool and plot each of the custom polygons. During the process of plotting, the following assumptions were made:<br/>
 
## For some areas with multiple zones separated on the map, a small area of the zone is being selected. For example, zone 4 have 3 different segments in level 1. However, they all refers to the elevators and stairs. By plotting only one of the zones, we will still be able to know that the employee is taking the stairs/elevators. There isn’t a need to identify which stairs or elevators they took. As such, for zone 4, only one of the custom polygon is drawn.
 
## For areas that were too big, only a small segment of the zone was plotted. For example, zone 1 encompasses the entire corridor spaces. If all the corridor spaces are drawn, the centre point will not be situated in zone 1. Therefore, only one part of the zone is drawn. As long as employees are within the same zone, they are in the corridor areas.<br/>The outcome of plotting these polygons will result in the following display on the online tool. The same process is repeated for each of the 3 levels.<p>[[File:Plotting of Zones GwendolineTanWanXin.png|500px|center]]</p>
 
# After the polygons have been plotted, the results can be exported into a .csv file format for analysis in Tableau. Once the data has been imported into Tableau, the centre coordinates of each zone can then be retrieved. The following shows the centre points that were plotted on the chart and the ability to export the data. This will then allow us to retrieve the coordinates for each zone to be plotted onto the map.<p>[[File:Individual Zone Coordinates GwendolineTanWanXin.png|500px|center]]</p>
 
# However, in level 2 and 3, shapes of each zones were not consistent and this has resulted in errors in the centre point of the zone. The following shows the errors that has been encountered during the process. <p>[[File:Error In Centre Point GwendolineTanWanXin.png|600px|center]]</p>
 
# In order to resolve this error, a manual task has to be performed to get the coordinate point of each erroneous zone coordinate. This can be performed in Tableau using the annotate function. The following shows an example of how the annotate function can provide the X/Y coordinate values. The process is repeated for each of the erroneous centre points. <p>[[File:Annotate Area With Point GwendolineTanWanXin.png|600px|center]]</p>
 
# After the completion of this process, the issue of lacking individual zone coordinates will be resolved.
 
<br/>
 
<p>'''Issue:''' Currently, the data for the fixed proximity sensors and mobile proximity sensors are in 2 separate spreadsheets. This makes the analysis process difficult with the need to blend or join data in Tableau.</p>
 
<p>'''Solution:''' To simplify the process of analysis, both the data from the fixed and mobile proximity sensors are combined into one single file. With the above process completed, we are also able to get the coordinates for each of the zone to be plotted on the chart. However, we do not have the zone information for the mobile proximity sensor. In this analysis, it does not matter as we are only interested in plotting the zone coordinates onto the chart. The following shows the data format of the transformed data:</p>
 
<p>[[File:Combined Proximity Data Format GwendolineTanWanXin.png|500px|center]]</p>
 
<br/>
 
<p>'''Issue:''' During the process of correlating the zone data and the coordinates, the dataset has a zone that stated “Server Room”. However, there is no point coordinate for this zone on our generated polygons.</p>
 
<p>'''Solution:''' To resolve this issue, the annotate function is also used to help us get the centre point coordinate of the server room. This result is then, updated into the file for analysis.</p>
 
<br/>
 
<p>'''Issue:''' With the zone coordinates and the X/Y coordinates from the mobile proximity sensor, the data can be plotted onto a map to identify where employees are located. However, with only the proximity ID in the captured data, we are unable to correlate the employee with each proximity ID. The employee list provided in the dataset also does not show the mapping between their name and the proximity ID.</p>
 
<p>'''Solution:''' Upon close observation of the employee name and the proximity ID, a correlation can be identified. The proximity ID is derived from the first letter of the employee’s first name and the entire string of the last name. The last three numbers refer to the number of times each employee request for a new pass. By default, the count starts from 001. The following formula is then applied to identify a correlation between the employee and the proximity ID:</p>
 
<p>[[File:Employee and Proximity ID GwendolineTanWanXin.png|650px|center]]</p>
 
  
=== Building Data Elements (HVAC Sensor Readings & Status Information) ===
 
<p>The building data elements provide multiple readings collected from different sensors in the entire building. However, the data is not structured in a way that allows for flexibility in the choice of sensor readings to be analysed and to apply filters by different floors and zones. As a result, there is a need to reshape the data to provide for this flexibility.</p>
 
<p>There are 3 main categories of data in the building data dataset:</p>
 
* Overall Building Data (e.g. Drybulb Temperature, Pump Power etc.)
 
* Floor Data (e.g. Air Loop Inlet Temperature, Cooling Coil Power etc.)
 
* Floor/Zone Data (e.g. Reheat Coil Power, CO2 Concentration etc.)
 
<p>To better analyse the data using Tableau, the following is being done:</p>
 
# The entire building data dataset is grouped into different Excel Spreadsheet as follows:<p>[[File:Building&Floor Data Spreadsheet GwendolineTanWanXin.png|500px]]    [[File:Floor Zone Data File GwendolineTanWanXin.png|500px]]</p>
 
# Tableau provides an Excel plugin that allow one to reshape data into a format for easy processing especially if multiple attributes were present for selection. With the use of the tool, the following is then performed to each of the grouped datasheet:
 
## For the building data spreadsheet, a data reshape is performed to obtain the following result: <p>[[File:Building Data Reshape GwendolineTanWanXin.png|650px|center]]</p>
 
## For the floor data spreadsheet, a data reshape is performed. The output of the reshaped data is similar to the building data, with the exception of the difference in attributes between the overall building and each floor attributes.
 
## For the floor/zone data spreadsheet, a data reshape is also performed on its attributes. However, due to the volume of the data generated, additional processing needs to be done as follows:<p>[[File:Process Floow Zone Data GwendolineTanWanXin.png|650px|center]]</p>
 
<p>With the reshaped data, it can then be imported into Tableau for analysis.</p>
 
  
=== Hazium Readings ===
 
The sensor readings for hazium concentration level were saved as 4 different files in the given dataset. These readings are consolidated into one single file. The result is similar to the building dataset elements. As such, similar processing techniques were used to process the data - Excel Tableau’s plugin for data reshape. The following illustrates the steps taken to achieve the final transformed dataset.
 
<p>[[File:Hazium Data File GwendolineTanWanXin.png|650px|center]]</p>
 
  
== Dataset Import Structure & Process ==
+
== Analysis ==
With the dataset analysis and transformation phase completed, the following files will have to be imported into Tableau for analysis:
 
<p>[[File:Data Sources GwendolineTanWanXin.png|900px|center]]</p>
 
<p>Each of the data file is added as a data source in Tableau. The relationships defined between each data source is the timestamp, floor and zone. This will allow analysis to be conducted across all the data sources at any one point.</p>
 
<p>Additional processing is performed to the first data source - Proximity Data and Employee List.</p>
 
* Import the proximity data as a data source.
 
* Add new data connections to the proximity data source. The new data connection file will be the Employee List.
 
* Perform a left join between the proximity data and the employee list to correlate the employee ID together. This will allow us to identify the employee with the captured proximity card ID. The following shows the configuration of the join:
 
<p>[[File:Tableau Data Join Of Proximity Data GwendolineTanWanXin.png|300px|center]]</p>
 
<p>To process and display the dates in a readable format for the analyst, each of the data sources will have a new calculated field to derive the date and day of the week. This will convert the date from the format “31/5/2016” to a format of “31 May (Tuesday)”.</p>
 
<p>The formula used is as follows: </p>
 
''STR(DATEPART('day', [Date/Time])) + " " + DATENAME('month',[Date/Time]) + " (" + DATENAME('weekday', [Date/Time]) + ")"''
 
  
== Interactive Visualization ==
+
=== Task 1: Spatio-temporal Analysis of Official Air Quality ===
<p>Throughout all the different dashboards, useful guides/tips are provided to help users navigate through the different filters and actions so that their analysis can be performed smoothly. The following interactivity elements are also used throughout all the dashboards to maintain consistency:</p>
+
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
{| class="wikitable"
+
|'''<big>1.1 Characterize Recent and Past Situation of Air Quality in Sofia City</big>
|-
+
'''
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 20%;" | Interactive Technique
 
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 40%" | Rationale
 
! style="font-weight: bold;background: #536a87;color:#fbfcfd;" | Brief Implementation Steps
 
|-
 
| <center>'''Filter dates with the use of time range slider''' <br/>[[File:Filter by Date GwendolineTanWanXin.png|200px|center]]</center>
 
|| <center>To provide flexibility for analysts to choose the time period that they are interested to analyse. <br/>The use of checkboxes or dropdown list requires the analyst to check/uncheck each date manually which is time-consuming. As such, a time range slider is preferred.</center>
 
||
 
# The date/time field have to be duplicated with its data type set to “date”
 
# Add the new field to be filtered.
 
|-
 
| <center>'''Filter each floor/zone using a single selection drop down list''' <br>[[File:Drop down list GwendolineTanWanXin.png|200px|center]]</center>
 
|| <center>To allow analysts to concentrate on the data collected from each level with the use of a single selection.
 
<br/>Use of a drop down list also allow analysts to easily choose the building level that they are interested to analyse.</center>
 
||
 
# Configure the filter selection to be a single selection drop down list
 
|-
 
| <center>'''Change and zoom of floor plans based on each floor filter''' <br>[[File:Dynamic Floor Plan GwendolineTanWanXin.png|200px|center]]</center>
 
|| <center>To allow for easy reference of mapping each building data elements with each zone. <br/>
 
When a user filters from one floor to another, the floor plan also changes to provide for quick and easy reference. Due to the space constraint, each floor plan has to be zoomed for users to identify and see the zone areas clearly.</center>
 
||
 
# Create calculated fields for the x and y axis.
 
# Put the 2 calculated fields into the worksheet view.
 
# To hide the mark, set the colour as transparent.
 
# Navigate to Maps > Background Images. Add the floor plans into the background images and configure it according to the filter condition.
 
# Put the “floor” attribute as a filter.
 
|}
 
  
The following sections elaborates on other interactivity techniques are integrated into each of the individual dashboard.
 
  
=== Task 1: Spatio-temporal Analysis of Official Air Quality ===
+
[[File:Task 1 Characterize PastandRecentSituation2013FinalFinal.png|800px|center]]
<p> Characterize the sensors’ coverage, performance and operation. Are they well distributed over the entire city? </p>
+
[[File:Task 1 Characterize PastandRecentSituation2014FinalFinal.png|800px|center]]
[[[File:Task 2- Distribution and Coverage of Sensors in 2017 and 2018.png|thumb|Distribution and Coverage of Sensors in 2017 and 2018]|800px|center]]
+
[[File:Task 1 Characterize PastandRecentSituation2015FinalFinal.png|800px|center]]
<p>To allow for flexibility in navigation, the following interactive techniques have been employed:</p>
+
[[File:Task 1 Characterize PastandRecentSituation2016FinalFinal.png|800px|center]]
{| class="wikitable"
 
|-
 
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 20%;" | Interactive Technique
 
! style="font-weight: bold;
 
[]
 
background: #536a87;color:#fbfcfd;width: 40%" | Rationale
 
! style="font-weight: bold;background: #536a87;color:#fbfcfd;" | Brief Implementation Steps
 
|-
 
| <center>'''Navigate across dashboards with buttons'''</center>
 
|| <center>To provide users the flexibility of moving from one dashboard to the other, with an easy and simple to use interface </center>
 
||
 
# Create images of the buttons and placed these images into the folder directory as follows: My Tableau Repository > Shapes.
 
# In Tableau, create a new calculated field.
 
# Drag the new calculated field into the “Shapes” mark. Configure the shapes that you want to use by selecting the Shape Palette loaded into the folder previously.
 
# In the dashboard, drag the worksheet with the button into an empty area. Configure the action, with the use of the filters action.
 
|-
 
| <center>'''Display tooltips when users hover over each button'''</center>
 
|| <center>To provide users with contextual information about the dashboard and the expected charts that they will look at after the button click</center>
 
||
 
# Configure the tooltips information on the marks shelf
 
|}
 
 
 
=== Employee Movement Dashboard ===
 
<p>The following shows the employee movement dashboard: </p>
 
[[File:Dashboard Employee Movement GwendolineTanWanXin.png|800px|center]]
 
  
<p>The following interactive techniques have been employed in this dashboard:</p>
+
<p>The figures above shows the PAST average daily concentration heat map for year 2013 to 2016 as daily readings are not computed for year 2017 and year 2018. The calendar heat map visualization allows users to appreciate the trend in average daily concentration from Jan to Dec at one glance. The calendar map also includes filter by Sampling location, air quality range as well as Year. With reference to EU Air Quality classification based on research, the air quality range includes 0-20:Good, 20-40:Moderate, 40-50:Normal, 50-75:Unhealthy, 75 and above: Very Unhealthy.The heat map’s colour intensity is specified within a range such that dark red represent unhealthy and very unhealthy air quality range and vice versa.</p>
{| class="wikitable"
+
<p> <b>Trends Analysis:</b>
|-
+
* From year 2013 to year 2016, the air quality exceeds EU limit (50 µg/m3) at the end of each year and at the beginning of each year. Specifically, the air quality reached unhealthy and very unhealthy range for majority of the days In January, February, November and December across all four years. In the middle of year (i.e. Quarter 2 and Quarter3), the air quality is much better and are generally below the EU limit value of 50 µg/m3.
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 20%;" | Interactive Technique
+
* There isn’t any clear trend in the change in daily concentration of PM10 across the days in a month throughout 2013 to 2016.  
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 40%" | Rationale
 
! style="font-weight: bold;background: #536a87;color:#fbfcfd;" | Brief Implementation Steps
 
|-
 
| <center>'''Traverse through an hourly time series by clicking on the arrows'''</center>
 
|| <center>To analyse and draw connections between employees and their movement in the building over a series of time</center>
 
||
 
# Drag the date/time field into the “Pages” shelf
 
# Change the field to “Hour” to enable the animation on an hourly basis
 
The following shows a screenshot of the configuration:
 
[[File:Pages Shelf GwendolineTanWanXin.png|150px|center]]
 
|-
 
| <center>'''Filter data source types based on fixed or mobile proximity type'''</center>
 
|| <center>To allow the analyst to understand the fixed and mobile proximity data</center>
 
||
 
# Add a checkbox filter into the dashboard
 
|-
 
| <center>'''Highlight department upon selection'''</center>
 
|| <center>To highlight departments that are of interest to the analyst</center>
 
||
 
# Add the shape legends for each department
 
# Select the new object and check “Highlight selected items”.
 
The following shows a screenshot of the configuration:
 
[[File:Enable Highlighting GwendolineTanWanXin.png|150px|center]]
 
 
|}
 
|}
To help analysts get started with the analysis of the data, a typical pattern of GAStech employees have also been identified and included in the dashboard.
+
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
+
|'''1.2 A Typical Day in Sofia City
=== HVAC Control System & Chemical Levels (Heating) ===
+
'''
<p>The following shows the HVAC Control System (Heating) dashboard: </p>
+
[[File:Task 1 Characterize PastandRecentSituation20172018Final.png|800px|center]]
[[File:Dashboard Heating GwendolineTanWanXin.png|850px|center]]
 
  
<p>The following interactive techniques have been employed in this dashboard:</p>
+
<p>This calendar heat map shows the hourly concentration of PM 10 throughout a typical day across different months from Nov 2017 to Sep 2018. The rightmost column “Average” shows the overall average readings of hourly concentration. Filter on sampling location and air quality range could be utilized to narrow down the scope of comparison. <p>
{| class="wikitable"
+
<p> <b>Trends Analysis:</b>
|-
+
* For the most RECENT average hourly concentration of PM 10 calendar map from Nov 2017 to Sep 2018, a typical day in Sofia city does not experience drastic change in air quality on average over the months. However, the average hourly concentration across these 11 months are generally higher in the range of 30+ µg/m3 before 9am and after 6pm which are after working hours. The average hourly concentration is lower in the range of 20+ µg/m3 from 9am to 6pm. The air quality is considered good throughout the day on average.  
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 20%;" | Interactive Technique
+
*However, the air quality readings are exceptionally high throughout the day (24 hours) in both November 2017 and January 2018. The air quality has reached unhealthy and very unhealthy range. This signifies that there might be external factors that influence the air quality to be extremely poor in those months.
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 40%" | Rationale
+
 
! style="font-weight: bold;background: #536a87;color:#fbfcfd;" | Brief Implementation Steps
 
|-
 
| <center>'''Highlight temperature type, upon selection'''</center>
 
|| <center>To enable analysts to visualize the differences between the water temperature and its respective set-points, all 3 lines have been placed on the same chart. However, situation arises when there is a need to look at one single attribute only. Therefore, the use of highlighting will help analyst to perform its tasks better.</center>
 
|| The implementation steps are similar to the use of highlighting in the “Employee Movement” dashboard.
 
|-
 
| <center>'''Highlight across all the different charts'''</center>
 
|| <center>To allow analysts to see how one data point is correlated to another data point in another chart, given the same date/time</center>
 
||
 
# Navigate to Dashboard > Actions > select “highlight”.
 
# Configure the source and target sheets for the highlighting of charts in the dashboard.
 
 
|}
 
|}
 +
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
 +
|'''1.3 Further Analysis on Sampling Points
 +
'''
 +
[[File:Task1 3.png|800px|center]]
  
=== HVAC Control System & Chemical Levels (Natural Ventilation) ===
+
<p>TThis dashboard shows the distribution of sampling points in Sofia city as well as the readings concentration, distance to Kerb, altitude and distance to building for these sampling points.<p>
<p>The following shows the HVAC Control System (Natural Ventilation) dashboard: </p>
+
<p> <b>Trends Analysis:</b>
[[File:Dashboard Natural Ventilation GwendolineTanWanXin.png|850px|center]]
+
*All the sampling points follow the same cyclical pattern for the time-series analysis of average daily concentration of PM10 pollutants. The cyclical pattern repeated across the years from 2013 to 2018.
 +
*Across the years, the mean concentration of PM10 pollutants is the highest for Orlov Most followed by Nadezhda. Orlov Most is near to residential or industrial building while Nadezhda is near not as near as Orlov Most to buildings.
 +
*The variation in concentration is highest for Hipodruma (as seen by the high range and outliers). There might be association with other factors since the distance to kerb, altitude and distance to building is similar to most of other sampling points.  
  
<p>The following interactive techniques have been employed in this dashboard:</p>
 
{| class="wikitable"
 
|-
 
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 20%;" | Interactive Technique
 
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 40%" | Rationale
 
! style="font-weight: bold;background: #536a87;color:#fbfcfd;" | Brief Implementation Steps
 
|-
 
| <center>'''Highlight across all the different charts'''</center>
 
|| <center>To allow analysts to see how one data point is correlated to another data point in another chart, given the same date/time</center>
 
|| The implementation steps are similar to the use of highlighting in the “HVAC Control System & Chemical Levels (Heating)” dashboard.
 
|-
 
| <center>'''Link filter with “Mechanical Ventilation” dashboard'''</center>
 
|| <center>To allow analysts to conduct their analysis smoothly such that when one filters for level 2 data in mechanical ventilation dashboard, they will also use level 2 data when they go into the natural ventilation dashboard.<br/>
 
The rationale is that both ventilation elements are related to each other and analysts might be interested to analyse both elements one after the other.</center>
 
||
 
# When configuring the filters, select the worksheets that will be affected by the filters.
 
# Other than selecting the worksheets in the dashboard, show all the worksheets in the workbook and check those that applies.
 
The following shows an example of the configuration:
 
[[File:Apply Filter to other Worksheets GwendolineTanWanXin.png|450px|center]]
 
 
|}
 
|}
  
=== HVAC Control System & Chemical Levels (Mechanical Ventilation) ===
+
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
<p>The following shows the HVAC Control System (Mechanical Ventilation) dashboard: </p>
+
|'''1.4 What anomalies do you find in the official air quality dataset?
[[File:Dashboard Mechanical Ventilation GwendolineTanWanXin.png|850px|center]]
+
'''
 +
*Inconsistent measurement on air quality data: The consistency occurs in the way they measure the air quality as hourly, var and daily concentration values are all available.
 +
*Incomplete data for robust time-series analysis: There are a few months (Jan 2017 to Oct 2017) that do not have any air quality data. There is a need to find out why.
 +
*Inconsistent coverage for Sampling Points: For Mladost (60881), only has one year of air quality reading available while for Orlov Most (9484), only have readings from year 2013 to 2015.  
 +
|}
  
<p>The following interactive techniques have been employed in this dashboard:</p>
+
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
{| class="wikitable"
+
|'''1.5 How do these affect your analysis of potential problems to the environment?
|-
+
'''  
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 20%;" | Interactive Technique
+
*Need to consider the appropriate measurement time method when comparing the air quality concentration for time-series analysis
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 40%" | Rationale
+
*The data points for sampling points might be too small (only 4 points with consistent readings) and thus affect the accuracy of analysis.
! style="font-weight: bold;background: #536a87;color:#fbfcfd;" | Brief Implementation Steps
+
*The poor air quality might be related to (1) the surrounding regions and facilities around the sampling points, (2) cyclical patterns across the months in different years and (3) Consumption behaviours of residents living in Sofia city
|-
 
| <center>'''Highlight across all the different charts'''</center>
 
|| <center>To allow analysts to see how one data point is correlated to another data point in another chart, given the same date/time</center>
 
|| The implementation steps are similar to the use of highlighting in the “HVAC Control System & Chemical Levels (Heating)” dashboard.
 
 
|}
 
|}
  
=== HVAC Control System & Chemical Levels (Air-Conditioning) ===
 
<p>The following shows the HVAC Control System (Air-Conditioning) dashboard: </p>
 
[[File:Dashboard AirCon GwendolineTanWanXin.png|850px|center]]
 
  
<p>The following interactive techniques have been employed in this dashboard:</p>
+
=== Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements  ===
{| class="wikitable"
+
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
|-
+
|'''2.1 Sensor Coverage, Performance and Operation
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 20%;" | Interactive Technique
+
'''
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 40%" | Rationale
+
* 2.1.1 Are they well distributed over the entire city?
! style="font-weight: bold;background: #536a87;color:#fbfcfd;" | Brief Implementation Steps
+
[[File:SensorCoverageFinal.png|800px|center]]
|-
+
The heat map density illustrates the distribution of sensors across Sofia city. The higher the intensity, the more number of sensors found in the specific area. According to this heat map, it shows that the sensors mostly concentrate in the central area of Sofia city while the number of sensors decreases drastically beyond the central area. This could be due to the fact that the city area has a higher percentage of residential areas and transportation network. Thus, citizens place more sensors in the central area of Sofia city.
| <center>'''Highlight temperature type, upon selection'''</center>
+
 
|| <center>To enable analysts to visualize the differences between the thermostat temperature and its respective set-points, all 3 lines have been placed on the same chart. However, situation arises when there is a need to look at one single attribute only. Therefore, the use of highlighting will help analyst to perform its tasks better.</center>
+
* 2.1.2 Are they all working properly at all times?
|| The implementation steps are similar to the use of highlighting in the “Employee Movement” dashboard.  
+
[[File:Sensor's Operation Analysis.png|800px|center]]
|}
+
By looking at the number of records captures by the sensors from Sept 17 to Sept 18, the number of records increases steadily from September 17 but from May 18 onwards, the records dropped drastically from 200 to below 80 records. By looking at the number of records across months, it shows that the number of recordings is the highest in Jan, Feb, Mar, Oct, Nov, and Dec. While for other months, the number of recording decrease rapidly from around 60-159k to 17-38k.Considering that the lifetime of sensors last more than one year, the inconsistent number of records shows that the sensors are not working properly throughout the months.  
  
=== HVAC Control System & Chemical Levels (Chemicals) ===
+
* 2.1.3 Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?
<p>The following shows the HVAC Control System (Chemicals) dashboard: </p>
+
[[File:Time-series Analysis Comparison.png|800px|center]]
[[File:Dashboard Chemicals GwendolineTanWanXin.png|850px|center]]
+
Through drawing out the time-series data of the average pollutant concentration and standard deviation for both Official Air Quality and Citizen Science Air Quality measurements, it shows that the reading range of P1 is much higher than that of PM10 under official air quality records (by 100 µg/m3). Meanwhile, the standard deviation of P1 is also much higher than that of PM10 under official air quality record. Therefore, the readings from sensors might not be fully accurate as the quality may differ significantly depending on the citizens' purchasing power or needs. Thus, the inconsistency in sensors used resulted in higher variation in the readings which may casue the readings to be inaccurate and inconsistent over time.
  
<p>The following interactive techniques have been employed in this dashboard:</p>
 
{| class="wikitable"
 
|-
 
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 20%;" | Interactive Technique
 
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 40%" | Rationale
 
! style="font-weight: bold;background: #536a87;color:#fbfcfd;" | Brief Implementation Steps
 
|-
 
| <center>'''Show different views of the Hazium data in the dashboard'''</center>
 
|| <center>To provide different perspectives for an analyst to conduct his investigation. <br/>
 
For example, the line graph allows analysts to view the change in hazium concentration level overtime while the heatmap allow analysts to easily view the concentration levels of hazium based on colours representation.</center>
 
|| -
 
|-
 
| <center>'''Highlight across different charts'''</center>
 
|| <center>To show the correlation between the same set of data that has been represented in 2 different chart layouts</center>
 
|| The implementation steps are similar to the use of highlighting in the “HVAC Control System & Chemical Levels (Heating)” dashboard.
 
|-
 
| <center>'''Configure safe range of CO2 level'''</center>
 
|| <center>To provide a clear view for the analysts to see if the level of carbon dioxide concentration exceeds a safe range
 
<br/>
 
To provide flexibility for analysts to configure a safe range, as defined by the organization</center>
 
||
 
# Switch to the “Analytics” view and choose to add a “Reference Band” into the chart.
 
# For the value, choose to add a new custom parameter. The following shows a screenshot of the new parameter created: [[File:Defining Parameters GwendolineTanWanXin.png|250px|center]]
 
# To create a band, define two parameters that is the typical safe range level to a maximum safe range level. This will help to create a reference band. The following shows the configuration of a reference band: [[File:Defining Reference Band GwendolineTanWanXin.png|250px|center]]
 
# Add both of these parameters onto the dashboard and analysts will be able to configure the values based on the organization’s preference.
 
|-
 
| <center>'''Click on data point to explore ventilation elements that might lead to variations in CO2 level'''</center>
 
|| <center>To allow co-referencing of carbon dioxide levels with the ventilation elements and analyse possible reasons that led to the changes in carbon dioxide concentration</center>
 
|| To ensure that the filter applied in the dashboard is linked with the “HVAC Control System Ventilation Elements” dashboard, similar steps were taken as mentioned previously to link filters across different dashboards.
 
<br/>To allow clicking of the chart to another dashboard, actions were defined based on the following configuration: [[File:Action to Link Dashboard GwendolineTanWanXin.png|350px|center]]
 
 
|}
 
|}
 +
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
 +
|'''2.2 Air Pollution
 +
'''
 +
Which part of the city shows relatively higher readings than others? Are these differences time dependent?
 +
[[File:Oct Reading.png|400px|center]]
 +
[[File:Nov Reading.png|400px|center]]
 +
[[File:Jan Reading.png|400px|center]]
 +
[[File:March 18.png|400px|center]]
 +
[[File:June Reading.png|400px|center]]
  
=== Power Consumption in Kronos Office Building ===
+
The central region has higher readings than other regions. However, as we show the history of the change in concentration for both P1 and P2 pollutants based on the density plot. At the beginning, the difference is relatively small from September 17 to October 17. However, from November 17 onwards, the difference increases sharply considering the increase in density in central region versus the rest. The difference is the sharpest in January 18. Subsequently, the difference decrease and in June 18, there is negligible difference between central region and non-central region.
<p>The following shows the Power Consumption dashboard: </p>
+
[[File:Dashboard Power Consumption GwendolineTanWanXin.png|850px|center]]
+
|}
  
<p>The following interactive techniques have been employed in this dashboard:</p>
+
=== Task 3: Factors Affecting Air Quality in Sofia City  ===
{| class="wikitable"
+
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
|-
+
|'''Factor 1: Local Energy Sources '''
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 20%;" | Interactive Technique
+
[[File:Task3 ThermalPlant and Meteorology Station.png|800px|center]]
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 40%" | Rationale
+
According to EU’s air quality report, production of electricity by burning of coal in thermal power plants and other industrial processes are a major contributor to unhealthy air. Based on research, it was found out that the two power plants (namely Sofia Power Plant and Sofia Iztok Power Plant are found near the city centre in Sofia. This shows that the high degree of pollution in the city central area might be due to the presence of power plant. Especially in areas with high population as well as during winter periods, the increase in needs for burning of coal will have an impact on Sofia’s air quality as a whole. Therefore, local energy source places an important role in influencing the air quality.
! style="font-weight: bold;background: #536a87;color:#fbfcfd;" | Brief Implementation Steps
 
|-
 
| <center>'''Click on data point to explore possible causes of variations in HVAC power demand'''</center>
 
|| <center>To allow analysts who are interested to identify reasons as to why the HVAC control system is consuming so much power with just a single click on a data point</center>
 
|| The implementation steps are similar to the use of linking technique used in the “HVAC Control System & Chemical Levels (Chemicals)” dashboard.  
 
 
|}
 
|}
== Interesting & Anomalous Observations ==
+
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
Using the dashboard as a platform for investigation and analysis, the following aims to provide answers to the questions posed.
+
|'''Factor 2: Local Meteorology '''
 +
The Meteorology data is collected based on the coordinates at the meteorology station near Sofia’s airport. As the location is centralized in Sofia’s map, we will consider the readings as the average readings for Sofia’s city as a whole.
 +
As compared to Citizen Science Air Quality readings, official air quality readings are more consistent with similar locations of sampling points and consistent readings at specific time duration. PM10 pollutant concentration from Official Air Quality will be analysed together with local meteorology data instead.
 +
[[File:WindSpeedFinal.png|800px|center]]
 +
As seen in the graph above, in January, October and November, the readings for average concentration in each month is high whereas that for average wind speed is lower. Likewise for the months (February to December), the readings for average concentration in each month is low whereas that for average wind speed is high. Wind speed is one factor that can influence air quality as it will determine how fast would be pollutants disperse to other cities or other areas. Having a low wind speed would trap the pollutants in the area within Sofia city while a high wind speed allows the pollutants in Sofia city to spread quickly, thus lowering the pollutant concentration.  
  
=== Q1: Typical Patterns In Proximity Card Data & Typical Day Of GAStech Employees ===
+
[[File:TempFinal.png|800px|center]]
'''Typical Patterns in Proximity Card Data'''<br/>
+
As seen in the graph above, from Apr to Oct, the temperature is higher than average while for average concentration of the pollutants, the readings are lower than average. This shows that temperature and pollutant concentration have an inverse relationship clearly. This could be due to the fact that energy consumption is higher during winter period while that for summer period is much lower. During summer period, Sofia’s temperature has a highest value in the 20 degree’s range, indicating that air-conditioning is unlikely to be heavily utilized. However, during winter, the low temperature would signify a need for high energy consumption.
Based on the data captured by the proximity sensors, the following shows a typical pattern in the proximity card data.
 
# For the fixed proximity sensors, readings are collected for 24hours during the weekdays, except for 1am and 4am. From 12midnight to 6am, only level 1 fixed proximity sensor will collect data. This can possibly mean that there is no employee movement from 12midnight to 6am on level 2 and 3.
 
# For the mobile proximity sensor, Rosie (the mobile robot) travels the halls at 9am and 2pm daily. On 4, 5, 11 and 12 June, the robot does not collect any readings. One possible reason is that these are weekends (Sat/Sun) and therefore, Rosie will not be required to travel along the hallways.
 
  
'''Typical Day for GAStech Employees'''<br/>
+
|}
<p>The following lists a typical day for GAStech employees, in all departments:</p>
+
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
# Employees start arriving in the office at 7am. By 9am, majority of the employees would have already arrived in the building and settle in their own office space.
+
|'''Factor 3: Local Topography '''
# Lunch time is usually between 12nn to 2pm.  
+
[[File:TopographicalDataFinal.png|800px|center]]
# At around 2pm, many employees will be back in their office and that’s when we can see lots of activities going on in the different levels.
+
The topography plot shows the altitude distribution in Sofia city. The result shows that the high altitude areas are at the left bottom corner of the map while the low altitude areas covers the rest of the map. Altitude can play a role in transportation of pollutants within Sofia City. During summer period, wind will blow from low to high altitude area while the wind will blow from high to low altitude area during winter period. This explains why the pollution is more serious in quarter 1 and quarter 4. As the main pollutant producer area concentrate in the city central area, the pollutants will be able to spread to high altitude area during summer. However, during winter, the pollutants will be trapped within the low altitude area in city central. The wind flow from high altitude area might also carry other pollutants abroad to Sofia city.  
# Employees typically end work at around 5pm, though some may leave slightly earlier.
 
# After 6pm, majority of the employees in the office are from the Engineering, Facilities and IT department.
 
# After 7pm, employees working in level 3 would have left the office.
 
# At 12midnight, most of the employees would have left the building and only people in the facilities team will be present in level 1.
 
# People in the facilities department will be available in the building 24 hours and its always the same people patrolling around the area.
 
# Other than the facilities personnel, IT and engineering employees often stay in the office and will only leave at around 12midnight.
 
  
<p>The following lists some activities that is part of a typical day pattern for employees in a specific department:</p>
 
{| class="wikitable"
 
|-
 
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 20%;" | Department
 
! style="font-weight: bold;background: #536a87;color:#fbfcfd;" | Activities
 
|-
 
| <center>'''Engineering'''</center>
 
||
 
# Between 2pm and 3pm, employees seem to appear around the meeting and training room.
 
# After 6pm, only Clemencia Whaley, Penney Bueno and Twana Quiroz will remain in the building. They will leave around 11.40pm.
 
|-
 
| <center>'''Executive'''</center>
 
||
 
# At around 12nn, CEO Sten Sanjorge Jr. will be moving around the building. At around 1pm, other executives start leaving for lunch.
 
|-
 
| <center>'''Facilities'''</center>
 
||
 
# Shift work is being done throughout the day as we can see some pattern for employees in the facilities department:
 
## Varro Awelon and Emile Earpa will always be in level 1 from 12mn to around 7.30/8am.
 
## Dylan Scozzese and Chi Staley will always be in level 1 from 8am onwards.
 
# At approximately 9am, some employees from the facilities department will proceed to the deli.
 
|-
 
| <center>'''Information Technology'''</center>
 
||
 
# At around 10.30am, all employees from the IT department will gather at the meeting room area on level 2.
 
# Out of all IT employees, Erminia Bello seems to be always staying in the office till late.
 
|-
 
| <center>'''Security'''</center>
 
||
 
# Employees do not typically move around the building. The data captured had shown that they always stay within their own office.
 
 
|}
 
|}
 +
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
 +
|'''Factor 4: Complex interactions between local topography and meteorological characteristics '''
 +
[[File:Correlation MatrixFinal.png|800px|center]]
 +
A correlation matrix plot can show the relationships within the meteorological measures. Within the meteorological characteristics, factors such as Dew Point Temperature and Temperature are positively correlated. Another example would be Wind Speed and Temperature. The multicollinearity that exists within the meteorological characteristics makes it difficult to pinpoint the exact variable that affects air quality clearly. In addition, the altitude measure from local topography data will also play a part in influencing the meteorological characteristics. However, the dataset on meteorological data is insufficient to study the interactions with local topography detailedly.
  
=== Q2: Interesting Patterns & Its Significance In Building Data ===
+
|}
The following identifies some interesting patterns and the significance of the pattern to GAStech.
+
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
{| class="wikitable"
+
|'''Factor 5: Transboundary Pollution '''
|-
+
[[File:Map-on-the-Bulgarian-coal-resources-energy-infrastructure-and-largest-power-plants.png|800px|center]]
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 5%;" | S/N
+
Transboundary pollution might be one cause for the poor air quality considering the rest of the thermal power plants that are operating in other cities in Bulgaria. Additional datasets should be studied to analyze the wind flow directions within Bulgarian and with its immediate neighbours.  
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 48%;" | Interesting Pattern
 
! style="font-weight: bold;background: #536a87;color:#fbfcfd;" | Significance
 
|-
 
| <center>'''1'''</center>
 
||The water heater temperature set-point is set at around 60C. This is approximately similar to the water heater outlet temperature. From this, we can deduce that water entering the heater is always heated to the set-point temperature, regardless of the temperature of water entering the heater.
 
<br/>During weekdays, the temperature of the water entering the heater slowly falls after 5am but will increase slowly after 12nn. Similar pattern is observed over the weekends, but at a smaller difference in the inlet temperature. However, the supply inlet mass flow rate remains the same.
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Heating) Dashboard]''</p>
 
|| A consistent pattern of a change in inlet temperature is observed over the different days. However, the flow rate of water remains unchanged. This meant that there are other reasons as to why the inlet temperature always changes throughout the day.
 
<br/>The management needs to identify the reasons for the temperature difference to prevent wastage of energy in heating up the water, especially during the weekdays where temperature difference is vast.
 
|-
 
| <center>'''2'''</center>
 
||The lower the supply inlet temperature, the higher the rate at which the water heater burns natural gas. From this, we can infer that the burning of natural gas is used as a way to heat up the water so as to maintain the outlet temperature.
 
<br/>The building also has a hot water system pump to heat up the water. Despite of the difference in the inlet temperature throughout each day, the amount of power used by the hot water system pump remains constant throughout the 2 weeks.
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Heating) Dashboard]''</p>
 
|| Based on the data collected, we can deduce that the building provides for 2 different methods to heat up the water – burning of natural gas and using the hot water system pump.
 
<br/>Through the pattern identified, we can see that the burning of natural gas is used more often than the hot water system pump to heat up the water.
 
<br/>Natural gas is a non-renewable resource. As such, burning of natural gas for water heating is a poor practice and does not adhere to energy efficiency standards, as claimed by the company. It is important for the management to look into the issue and make use of the hot water system pump for heating of water instead.
 
|-
 
| <center>'''3'''</center>
 
|| The supply fan seems to consume higher levels of power from 7am to 10pm. The outlet mass flow rate also seems to be consistent across the days, when the supply fan is operating correctly.
 
<br/>The timing in which the supply fan is consuming higher levels of power is similar to employees’ working hours, including the IT/Engineering department.
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Mechanical Ventilation) Dashboard]''</p>
 
|| Through this pattern, we can infer that the building seems to be following energy efficiency standards by using more energy only when required, such as only during working hours.
 
|-
 
| <center>'''4'''</center>
 
|| On level 1 and 2, the bathroom exhaust fan seems to be consistently turned on throughout the days. However, on level 3, the bathroom exhaust fan power is only consuming power between 7am to 10pm.
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Mechanical Ventilation) Dashboard]''</p>
 
|| Through this pattern, we can see that the use of the bathroom exhaust fan on level 1 and 2 does not seem to adhere to the energy efficiency standards. The practice of using energy only when required can be seen on level 3. However, as mentioned previously, employees in level 3 will be gone by 7pm. Despite so, the bathroom exhaust fan continues to generate power until late at night. This shows that the building is not following energy efficiency standards, as claimed.
 
|-
 
| <center>'''5'''</center>
 
|| The level of carbon dioxide concentration typically falls within a safe range of 350 to 1000 ppm for both level 2 and 3. However, the level of carbon dioxide concentration in level 1, especially zone 5 (Conference room) and zone 7 (loading area/server room) often falls outside of the safe range.
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Chemicals) Dashboard]''</p>
 
|| The air ventilation system in the building is performing its expected functions in level 2 and 3, on a normal day. However, more needs to be done to improve the ventilation on level 1, to ensure that it remains a conducive working environment for all employees. This is especially important in the conference room, where employees conduct meetings. If the level of carbon dioxide falls in the unhealthy range, employees may experience discomfort and may not be able to perform up to standards.
 
|-
 
| <center>'''6'''</center>
 
|| In the afternoon, the outdoor air mass flow rate is generally higher. However, during this period of time, the outside air temperature is generally warmer. When the outside air temperature is warmer, the percentage of outside air delivered by the HVAC system is generally lower. This pattern is generally observed across all the 3 levels.
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Natural Ventilation) Dashboard]''</p>
 
|| This is a good sign that the building is adhering to the energy efficiency standards. When the outside air is warmer, it takes more power to cool down the air so as to maintain a conducive temperature. As such, by restricting flow of warmer outdoor air, it helps the company to save on energy and power.
 
|-
 
| <center>'''7'''</center>
 
|| Level 1 has the highest outdoor air mass flow fraction. However, it has the lowest flow rate of air returning back to the HVAC system.
 
<br/> On level 3, the outdoor air mass flow fraction can go as low as 6%. However, it has high flow rate of air returning back to the HVAC system.
 
<p> With the difference in outdoor air mass flow rate, the temperature of the air returning back to the HVAC system is higher in level 1 as compared to level 3. However, more energy is used on level 3 as compared to level 1.</p>
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Natural Ventilation) Dashboard]''</p>
 
|| From this pattern, we can clearly see the exchange of air between the outside air and the internal building. When the amount of outdoor air flowing into the floor is high, the rate of air returning back to the system is lower. This shows that the building is adopting good air-conditioning/ventilation system.
 
<br/>However, with higher flow rate of warmer outside air, there is a need for the cooling coil to use more energy in level 1 as compared to level 3. This will help to maintain a conducive temperature by cooling the warmer air. Despite so, the cooling coil in level 3 used more power than level 1 instead. This made the already cool level 3 cooler but level 1, which is hotter remains at a higher overall temperature.
 
This indicates possible issues with the HVAC control system which needs the attention of the operations staff.
 
|-
 
| <center>'''8'''</center>
 
|| The night cycle control status in level 3 is always off. In contrast to the night cycle control status in level 1, the status will be turned on after normal working hours of the company and during Sunday.
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Air Conditioning) Dashboard]''</p>
 
|| There is a need for the operations staff to check as to why the night cycle manager is not working on level 3. Although there is no direct evidence to show the consequences of the night cycle control manager being off, it may potentially cause wastage of energy especially when the set-point temperature is not set appropriately.
 
|-
 
| <center>'''9'''</center>
 
|| Across all the three levels, the corridor spaces have constantly been using high levels of power.
 
<p style="font-size:11px">''[Source: Power Consumption in Kronos Office Building Dashboard]''</p>
 
|| This is bad for the company as the corridor spaces do not contribute greatly to improving employees’ productivity or efficiency. Spending 80% of their energy in these areas are considered as energy wastage and potential measures have to be employed to reduce energy use in these areas.
 
|-
 
| <center>'''10'''</center>
 
|| Level 3 Zone 9 (server room) has been constantly using large amounts of equipment power.
 
<p style="font-size:11px">''[Source: Power Consumption in Kronos Office Building Dashboard]''</p>
 
<br/>Temperature of the air in this zone is also constantly kept at a lower temperature than other zones.
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Air Conditioning) Dashboard]''</p>
 
|| It is normal for the server to consume the largest amount of equipment power. At the same time, knowing that the server room have to be turned on 24 hours a day, measures have been taken to ensure a cooler temperature in the zone. This is notable for the company as it shows that they did deployed their resources properly.
 
 
|}
 
|}
  
=== Q3: Notable Anomalies/Unusual Events In Data ===
+
== Interactive Visualization ==
<p>The following lists some of the anomalies and unusual events identified in the data. Events that are possibly related to each other are grouped as one anomaly.</p>
 
<p>The priority level of issues will be ordered based on the following criteria:</p>
 
* (Highest Priority) An issue leading to potential health risks or death to employees
 
* An issue leading to a decrease in employees’ morale
 
* An issue undermining the claim of Kronos Office as an energy efficiency building
 
* (Lowest Priority) An issue causing the financial losses to the company
 
 
 
{| class="wikitable"
 
|-
 
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 5%;" | Priority
 
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 48%;" | Anomaly/Unusual Events
 
! style="font-weight: bold;background: #536a87;color:#fbfcfd;" | Possible Danger/Serious Issue
 
|-
 
| <center>'''1'''</center>
 
|| There is a high hazium gas concentration level across all the levels in the building on 11 June. This takes place in the afternoon from 1pm till the wee hours, the next day.
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Chemicals) Dashboard]''</p>
 
|| Not much is known about Hazium gas, except for the fact that it is dangerous. With the high levels of Hazium gas concentration throughout the entire building, it may become unsafe for the employees to work in. In the worst case scenario, all employees may be subjected to health ailments or death in the building.
 
<br/>If the issue is not resolved, similar incidents may happen in future. This will put all employees in harm’s way and potentially, bringing health risks to the employees.
 
|-
 
| <center>'''2'''</center>
 
|| On 3 and 9 June, there is a rise in hazium gas concentration across all zones in the building. However, it was the highest in level 3, zone 1. This corresponds to the GAStech CEO’s office. The gas concentration level remains high throughout the entire day.
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Chemicals) Dashboard]''</p>
 
|| With a rise in hazium gas concentration in the CEO’s office, it may pose serious harm to the CEO. Furthermore, it was also mentioned that there are disgruntled employees in the company. It is possible that these employees may attempt to harm the CEO.
 
<br/>If the issue is not resolved, similar incidents may happen in future. This will threaten the safety of the CEO and other employees working within the area.
 
|-
 
| <center>'''3'''</center>
 
|| In level 3 zone 1 (CEO’s office), there is a constant high level of temperature (ranging between 32 to 40C) from 1pm to 4am the next day. This happens every day from 2 June onwards.
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Air Conditioning) Dashboard]''</p>
 
|| It is important to check the cause of the regular high level of temperatures in the CEO’s office. This might be due to acts of disgruntled employees in the company or for other unknown happenings in the CEO’s office.
 
<br/>If the issue is not resolved, the temperature in the zone may always be higher than normal and more energy might be consumed in an attempt to lower the temperature. In the worst case scenario, the situation may further worsen causing harm to the CEO, especially when the temperature is extremely high.
 
|-
 
| <center>'''4'''</center>
 
|| Between 5 and 6 June, there is a sudden increase in carbon dioxide concentration in level 1 zone 3 (main entrance), zone 5 (conference room) and zone 7 (loading area and server room). However, this pattern was not observed in level 2 and 3.
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Chemicals) Dashboard]''</p>
 
<br/>To investigate the potential cause of the issue, I attempted to analyse the ventilation elements. It was discovered that the supply fan is generating lower amounts of electricity on 5 June as compared to normal days. The outlet mass flow rate was also low on level 1, during that day. From this pattern, it is possible that there is poor ventilation in the building that results in the spike in carbon dioxide concentration.
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Natural & Mechanical Ventilation) Dashboard]''</p>
 
|| It is important to identify the potential cause of the sudden spike in carbon dioxide concentration, especially during the weekends. Furthermore, the areas in which the spike is observed are working zones in the building.
 
<br/>If the issue is not resolved, similar cases may happen in future and this may threaten the level of safety in the building.
 
|-
 
| <center>'''5'''</center>
 
|| On 7 and 8 June, the flow rate of outside air entering the HVAC system dropped. However, the percentage of outside air delivered by the HVAC system is actually higher than usual. This is abnormal because a lower flow rate of outside air should translate to a lower percentage of outside air delivered by the HVAC system. However, this was not the case observed on 7 and 8 June.
 
<br/>Furthermore, the total flow rate of air returning to the HVAC system is also extremely low. Adding on to that, the temperature of the air returning back to the HVAC system is also higher than usual. This is observed across all the 3 levels.
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Natural Ventilation) Dashboard]''</p>
 
<br/>On the same days, the supply fan also consumes extremely low levels of fan power from 7am to 10pm. This has led to a sharp drop in the supply fan outlet mass flow rate and a sudden rise in temperature in level 3. After 10pm, the supply fan starts to generate abnormally high amount of fan power until 12am.
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Mechanical Ventilation) Dashboard]''</p>
 
<br/>The reheat damper position in almost closed within the same period of time. The flow rate of air entering the zone is also lower than usual and the temperature of the air for the zones are also higher than normal. This pattern is also reflected through the natural ventilation elements, as mentioned previously.
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Air Conditioning) Dashboard]''</p>
 
<br/>On the same days, there were sudden increases in the carbon dioxide concentration across all the 3 levels in the building. This could be brought about by the events occurring in the building.
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Chemicals) Dashboard]''</p>
 
|| It is important to identify possible factors that had led to the poor circulation of air and an extremely low levels of fan power generated to all floors in the building. Possible explanations could include a breakdown in the HVAC system. Of course, further investigation has to be performed to confirm the cause.
 
<br/>If the issue is not resolved, the higher temperature and carbon dioxide concentration in each floor may cause employees to feel uncomfortable in the building which further decreases the employee morale. Also, level 3 houses a server room and high temperature may cause the servers to break down, resulting in high costs for the company.
 
|-
 
| <center>'''6'''</center>
 
|| On level 3, the HVAC system fan consumes higher amounts of power every weekends as compared to the weekdays. This is unusual as employees are not typically working during the weekends. Therefore, the need for high usage of supply fan in the level is not required.
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Mechanical Ventilation) Dashboard]''</p>
 
|| There is a need to check the issue as to why the HVAC system fan is consuming so much fan power on level 3 during the weekends.
 
If the issue is not resolved, large amounts of energy will be wasted every weekend on level 3. This translates to large amounts of money wasted and this also means that the company is not adhering to energy efficiency standards.
 
|-
 
| <center>'''7'''</center>
 
|| There is an abnormally high supply fan outlet mass flow rate on 11 and 12 June for all 3 levels. This is possibly supplied by the HVAC system fan, which has consumed abnormally high amounts of fan power on both days. This pattern continues until the early morning of 13 June.
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Mechanical Ventilation) Dashboard]''</p>
 
<br/>The above pattern is supported with an abnormally high amounts of HVAC power consumed on 11 and 12 June.
 
<p style="font-size:11px">''[Source: Power Consumption in Kronos Office Building Dashboard]''</p>
 
|| There is a need to check the issue as to why the HVAC system fan was triggered to generate abnormally high amounts of power.
 
<br/>If the issue is not resolved, the company could potentially lose money from the high wastage of power consumed especially when no one is expected to be in the building during the weekends. The company is also not adhering to energy efficiency standards by allowing the wastage to occur.
 
|-
 
| <center>'''8'''</center>
 
|| On 11 and 12 June, the temperature of the air and the cooling/heating set-point is the same for all 3 levels. The reheat damper position is almost open throughout the entire day and this has brought about high levels of air flow rate into the different zones. The temperature of the air entering the zone is also higher than normal. A similar pattern is observed across all 3 levels in the building. The amount of energy used by the cooling coil power is also shown to be higher during these 2 days.
 
<p style="font-size:11px">''[Source: HVAC Control System & Chemical Levels (Air Conditioning) Dashboard]''</p>
 
|| There is a need to check the cooling/heating set-point temperature in the different levels and the cause of the reheat damper position being fully opened throughout the days.
 
<br/>If the issue is not resolved, there will be wastage of energy in trying to cool the building.
 
|}
 
  
=== Q4: Observed Relationships Between Proximity Card Data & Building Data Elements ===
+
<p>The interactive visualization can be accessed here: https://public.tableau.com/profile/yunyingkaelyn#!/vizhome/Spatio-temporalAnalysisofAirQualityinSofiaCity/FinalStory </p>
There is an attempt to identify possible relationships between the proximity card data and the building data elements. However, after much searching, only one relationship can be spotted, as elaborated below:
 
  
<p>Orhan Strum is a suspicious employee who may have attempted to bring about an increase in hazium gas concentration in the CEO’s office. </p>
 
<p>The following evidence was observed:</p>
 
# An abnormally high hazium gas concentration is observed in the CEO’s office on 11 June afternoon, till the next day. Using this as a potential lead, it was discovered that Orhan Strum had appeared in level 3 on 11 June between 8.30am to 1pm. After 1pm, the hazium gas concentration in the CEO’s office gradually increases. Orhan Strum was also not observed to be present in the office, as the hazium gas concentration rises.
 
# On days when the hazium gas concentration was high (3 and 9 June), Orhan Strum will be in the building before it happens. Once the hazium gas concentration gradually increases, he will not be in the office anymore. This pattern was observed both on 2 June and 8 June, before the hazium gas concentration increases. On 9 June, when the hazium gas concentration went higher than normal, he was not detected to be present in the office building.
 
With the above evidence, I attempted to analyse whether a similar cause and effect can be identified. However, it was found out that on days when the hazium gas concentration is slightly higher in level 3, Orhan Strum will still be present in the office building. This reduces the probability that Orhan Strum is a potential culprit, since one will not harm itself with the dangerous gas. At the same time, the activities occur on a weekday and this increases the difficulty of finding a suspect. On the other hand, this could be a deliberate activity that Orhan Strum took to reduce his chances of being identified.
 
<p>Taking into account the evidence and evaluation of similar patterns, the level of confidence in the assessment of the relationship is still moderately high.</p>
 
  
 
== References ==
 
== References ==
In the completion of the analysis, the following references have been extremely useful:
+
<p>Understanding the current issues of poor air quality in Bulgaria https://www.eea.europa.eu/publications/air-quality-in-europe-2018/at_download/file </p>
* Tableau Training Video - Polygon Maps (http://www.tableau.com/learn/tutorials/on-demand/polygon-maps-8.2)
+
<p>Map on the Bulgarian Coal Resources and Energy Infrastructure https://www.researchgate.net/figure/Map-on-the-Bulgarian-coal-resources-energy-infrastructure-and-largest-power-plants_fig1_257941554 </p>
* Tableau Training Video - Background Images (http://www.tableau.com/learn/tutorials/on-demand/background-images-8)
+
<p>Coordinates of Sofia's Thermal Plants http://www.wikiwand.com/en/List_of_power_stations_in_Bulgaria#/Thermal <p>
* Creating Custom Polygons On A Background Image (https://tableauandbehold.com/2015/04/13/creating-custom-polygons-on-a-background-image/)
 
* Dynamically Switch Images Using Filters (https://community.tableau.com/message/182926#182926)
 
* Tableau Tip Week - Dashboard Navigation Buttons (http://www.thedataschool.co.uk/niccolo-cirone/tableau-tip-week-wednesday-creating-dashboard-navigator-buttons/)
 
  
 
== Comments ==
 
== Comments ==

Latest revision as of 06:24, 13 November 2018

Problem & Motivation

Air pollution is an important risk factor for health in Europe and worldwide. A recent review of the global burden of disease showed that it is one of the top ten risk factors for health globally. Worldwide an estimated 7 million people died prematurely because of pollution; in the European Union (EU) 400,000 people suffer a premature death. The Organisation for Economic Cooperation and Development (OECD) predicts that in 2050 outdoor air pollution will be the top cause of environmentally related deaths worldwide. In addition, air pollution has also been classified as the leading environmental cause of cancer. Air quality in Bulgaria is a big concern: measurements show that citizens all over the country breathe in air that is considered harmful to health. For example, concentrations of PM2.5 and PM10 are much higher than what the EU and the World Health Organization (WHO) have set to protect health. Bulgaria had the highest PM2.5 concentrations of all EU-28 member states in urban areas over a three-year average. For PM10, Bulgaria is also leading on the top polluted countries with 77 μg/m3on the daily mean concentration (EU limit value is 50 μg/m3). According to the WHO, 60 percent of the urban population in Bulgaria is exposed to dangerous (unhealthy) levels of particulate matter (PM10).

This assignment aims to study the following:

  • Task 1: Spatio-temporal Analysis of Official Air Quality
  • Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements
  • Task 3: Find out the relationship of the above analysis with other factors (Local energy sources, Local meteorology, Local topography, Complex interactions between local topography and meteorological characteristics and Transboundary pollution)

Dataset Analysis & Transformation Process

Task 1: Spatio-temporal Analysis of Official Air Quality

1. Combine all time-series data (e.g. BG_5_9572_2017_timeseries) into one single excel spreadsheet

  • Create a new "Station" Column (indicating station code i.e. 9421) and "Year" Column based on the year for each of the time-series data
  • Use excel to combine the rest of time-series files based on all the common columns (e.g. Countrycode, Namespace, AirQualityNetwork and etc.)
  • Based on the analysis of the existing combined time-series data from 2013 to 2018, there is a drastic difference in the level of aggregation across the years. As shown in the table below, Year 2016 has a combination of hourly air quality readings and daily air quality readings while in 2017, there is a mixture of hour and var readings as for certain days, readings are not measured at one-hour interval continuously. As such, the analysis on air quality readings will be based on "Day" averaging time from year 2013 to 2016 as it is impossible to lower the aggregation level of "Day" to "Hour" readings in 2016. On the other hand, the analysis on air quality readings will be based on "Hour" averaging time for 2017 and 2018. For the data in 2017,average readings will be taken if the readings for any specific days are done on "Var" basis. Hourly analysis is also more accurate for both 2017 and 2018 as there are missing data on specific months. (In 2017, only Nov and Dec data are available while in 2018, only Jan to Sep's data are available)
Task1DataTransformation a.png
  • Due to the standardization in averaging time, the values under "Concentration", "DatetimeBegin" and "DatetimeEnd" have been adjusted accordingly while other column values remain unchanged.
  • There are quite a number of duplicated readings in the dataset. These duplicated readings are removed during the transformation process to avoid unequal weightage.
  • Due to the high variation in raw data and small dataset, excel is used for the transformation process.

2. Merge metadata file with combined time-series data

  • According to the source of scrapped data (http://discomap.eea.europa.eu/map/fme/AirQualityExport.htm),"the join between time-series files and the metadata file should be made using the Countrycode (or Namespace) and SamplingPoint". Thus, metadata file and time-series data time are merged via Vlookup function in excel based on SamplingPoint (since Countrycode are all "BA" for both data file).
  • Upon further inspection, these two datasets have a number of common columns with same values. These columns include "Countrycode, Namespace, AirQualityNetwork, AirQualityStation, AirQualityStationEoICode, SamplingProcess, AirPollutantCode, AirPollutant" and thus repeated columns are removed.

Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements

  • Geohash represents station locations. However, Tableau is not able to interpret geohash as geographic data. Before Air Tube data is imported to Tableau for analysis, geohash needs to be decoded into geographical coordinates. Due to the sheer size in Air Tube datasets(data_bg_2017.xlsx and data_bg_2018.xlsx), R packages ("devtools", "tidyverse" and "geohas") as indicated by Prof Kam will be used to transform the geohashed raw data to a new csv file containing the coordinates of the locations.
  • Next, 2017 and 2018 datasets that have been transformed will be combined based on the R programming code below.
Task2Transformation.png
  • The combined datasets comprises of readings across the whole Bulgaria area which is beyond the scope of our concern (i.e. Sofia). To maintain the consistency in comparison, Inclusion filter function is used to filter in only longitudes and latitudes that are in Sofia city. Due to the difficulty in comparing with the real geographical area of Sofia, an existing map on Sofia City (from Open Street Map) was used to filter through the Tableau filter function. The folowing shows map of Sofia and the filtering function.
Sofia Map.png
Tableau Filter.png

Task 3: Find out the relationship of the air quality analysis with other factors

  • The Meteorological data scrapped is currently in crosstab format which is not suitable for analysis on Tableau. Hence, the dataset has to be transformed to columnar format. Hence, pivoting will be done to transform the data.
  • As the data includes average, minimum and maximum readings for different types of measurements and are presented on separate columns. The format is not suitable to be processed on Tableau if there is a need to introduce filters in the dashboard. Hence the dataset is transformed to as shown below.
Task3Transformation.png


Analysis

Task 1: Spatio-temporal Analysis of Official Air Quality

1.1 Characterize Recent and Past Situation of Air Quality in Sofia City


Task 1 Characterize PastandRecentSituation2013FinalFinal.png
Task 1 Characterize PastandRecentSituation2014FinalFinal.png
Task 1 Characterize PastandRecentSituation2015FinalFinal.png
Task 1 Characterize PastandRecentSituation2016FinalFinal.png

The figures above shows the PAST average daily concentration heat map for year 2013 to 2016 as daily readings are not computed for year 2017 and year 2018. The calendar heat map visualization allows users to appreciate the trend in average daily concentration from Jan to Dec at one glance. The calendar map also includes filter by Sampling location, air quality range as well as Year. With reference to EU Air Quality classification based on research, the air quality range includes 0-20:Good, 20-40:Moderate, 40-50:Normal, 50-75:Unhealthy, 75 and above: Very Unhealthy.The heat map’s colour intensity is specified within a range such that dark red represent unhealthy and very unhealthy air quality range and vice versa.

Trends Analysis:

  • From year 2013 to year 2016, the air quality exceeds EU limit (50 µg/m3) at the end of each year and at the beginning of each year. Specifically, the air quality reached unhealthy and very unhealthy range for majority of the days In January, February, November and December across all four years. In the middle of year (i.e. Quarter 2 and Quarter3), the air quality is much better and are generally below the EU limit value of 50 µg/m3.
  • There isn’t any clear trend in the change in daily concentration of PM10 across the days in a month throughout 2013 to 2016.
1.2 A Typical Day in Sofia City

Task 1 Characterize PastandRecentSituation20172018Final.png

This calendar heat map shows the hourly concentration of PM 10 throughout a typical day across different months from Nov 2017 to Sep 2018. The rightmost column “Average” shows the overall average readings of hourly concentration. Filter on sampling location and air quality range could be utilized to narrow down the scope of comparison.

Trends Analysis:

  • For the most RECENT average hourly concentration of PM 10 calendar map from Nov 2017 to Sep 2018, a typical day in Sofia city does not experience drastic change in air quality on average over the months. However, the average hourly concentration across these 11 months are generally higher in the range of 30+ µg/m3 before 9am and after 6pm which are after working hours. The average hourly concentration is lower in the range of 20+ µg/m3 from 9am to 6pm. The air quality is considered good throughout the day on average.
  • However, the air quality readings are exceptionally high throughout the day (24 hours) in both November 2017 and January 2018. The air quality has reached unhealthy and very unhealthy range. This signifies that there might be external factors that influence the air quality to be extremely poor in those months.
1.3 Further Analysis on Sampling Points

Task1 3.png

TThis dashboard shows the distribution of sampling points in Sofia city as well as the readings concentration, distance to Kerb, altitude and distance to building for these sampling points.

Trends Analysis:

  • All the sampling points follow the same cyclical pattern for the time-series analysis of average daily concentration of PM10 pollutants. The cyclical pattern repeated across the years from 2013 to 2018.
  • Across the years, the mean concentration of PM10 pollutants is the highest for Orlov Most followed by Nadezhda. Orlov Most is near to residential or industrial building while Nadezhda is near not as near as Orlov Most to buildings.
  • The variation in concentration is highest for Hipodruma (as seen by the high range and outliers). There might be association with other factors since the distance to kerb, altitude and distance to building is similar to most of other sampling points.
1.4 What anomalies do you find in the official air quality dataset?

  • Inconsistent measurement on air quality data: The consistency occurs in the way they measure the air quality as hourly, var and daily concentration values are all available.
  • Incomplete data for robust time-series analysis: There are a few months (Jan 2017 to Oct 2017) that do not have any air quality data. There is a need to find out why.
  • Inconsistent coverage for Sampling Points: For Mladost (60881), only has one year of air quality reading available while for Orlov Most (9484), only have readings from year 2013 to 2015.
1.5 How do these affect your analysis of potential problems to the environment?

  • Need to consider the appropriate measurement time method when comparing the air quality concentration for time-series analysis
  • The data points for sampling points might be too small (only 4 points with consistent readings) and thus affect the accuracy of analysis.
  • The poor air quality might be related to (1) the surrounding regions and facilities around the sampling points, (2) cyclical patterns across the months in different years and (3) Consumption behaviours of residents living in Sofia city


Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements

2.1 Sensor Coverage, Performance and Operation

  • 2.1.1 Are they well distributed over the entire city?
SensorCoverageFinal.png

The heat map density illustrates the distribution of sensors across Sofia city. The higher the intensity, the more number of sensors found in the specific area. According to this heat map, it shows that the sensors mostly concentrate in the central area of Sofia city while the number of sensors decreases drastically beyond the central area. This could be due to the fact that the city area has a higher percentage of residential areas and transportation network. Thus, citizens place more sensors in the central area of Sofia city.

  • 2.1.2 Are they all working properly at all times?
Sensor's Operation Analysis.png

By looking at the number of records captures by the sensors from Sept 17 to Sept 18, the number of records increases steadily from September 17 but from May 18 onwards, the records dropped drastically from 200 to below 80 records. By looking at the number of records across months, it shows that the number of recordings is the highest in Jan, Feb, Mar, Oct, Nov, and Dec. While for other months, the number of recording decrease rapidly from around 60-159k to 17-38k.Considering that the lifetime of sensors last more than one year, the inconsistent number of records shows that the sensors are not working properly throughout the months.

  • 2.1.3 Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?
Time-series Analysis Comparison.png

Through drawing out the time-series data of the average pollutant concentration and standard deviation for both Official Air Quality and Citizen Science Air Quality measurements, it shows that the reading range of P1 is much higher than that of PM10 under official air quality records (by 100 µg/m3). Meanwhile, the standard deviation of P1 is also much higher than that of PM10 under official air quality record. Therefore, the readings from sensors might not be fully accurate as the quality may differ significantly depending on the citizens' purchasing power or needs. Thus, the inconsistency in sensors used resulted in higher variation in the readings which may casue the readings to be inaccurate and inconsistent over time.

2.2 Air Pollution

Which part of the city shows relatively higher readings than others? Are these differences time dependent?

Oct Reading.png
Nov Reading.png
Jan Reading.png
March 18.png
June Reading.png

The central region has higher readings than other regions. However, as we show the history of the change in concentration for both P1 and P2 pollutants based on the density plot. At the beginning, the difference is relatively small from September 17 to October 17. However, from November 17 onwards, the difference increases sharply considering the increase in density in central region versus the rest. The difference is the sharpest in January 18. Subsequently, the difference decrease and in June 18, there is negligible difference between central region and non-central region.

Task 3: Factors Affecting Air Quality in Sofia City

Factor 1: Local Energy Sources
Task3 ThermalPlant and Meteorology Station.png

According to EU’s air quality report, production of electricity by burning of coal in thermal power plants and other industrial processes are a major contributor to unhealthy air. Based on research, it was found out that the two power plants (namely Sofia Power Plant and Sofia Iztok Power Plant are found near the city centre in Sofia. This shows that the high degree of pollution in the city central area might be due to the presence of power plant. Especially in areas with high population as well as during winter periods, the increase in needs for burning of coal will have an impact on Sofia’s air quality as a whole. Therefore, local energy source places an important role in influencing the air quality.

Factor 2: Local Meteorology

The Meteorology data is collected based on the coordinates at the meteorology station near Sofia’s airport. As the location is centralized in Sofia’s map, we will consider the readings as the average readings for Sofia’s city as a whole. As compared to Citizen Science Air Quality readings, official air quality readings are more consistent with similar locations of sampling points and consistent readings at specific time duration. PM10 pollutant concentration from Official Air Quality will be analysed together with local meteorology data instead.

WindSpeedFinal.png

As seen in the graph above, in January, October and November, the readings for average concentration in each month is high whereas that for average wind speed is lower. Likewise for the months (February to December), the readings for average concentration in each month is low whereas that for average wind speed is high. Wind speed is one factor that can influence air quality as it will determine how fast would be pollutants disperse to other cities or other areas. Having a low wind speed would trap the pollutants in the area within Sofia city while a high wind speed allows the pollutants in Sofia city to spread quickly, thus lowering the pollutant concentration.

TempFinal.png

As seen in the graph above, from Apr to Oct, the temperature is higher than average while for average concentration of the pollutants, the readings are lower than average. This shows that temperature and pollutant concentration have an inverse relationship clearly. This could be due to the fact that energy consumption is higher during winter period while that for summer period is much lower. During summer period, Sofia’s temperature has a highest value in the 20 degree’s range, indicating that air-conditioning is unlikely to be heavily utilized. However, during winter, the low temperature would signify a need for high energy consumption.

Factor 3: Local Topography
TopographicalDataFinal.png

The topography plot shows the altitude distribution in Sofia city. The result shows that the high altitude areas are at the left bottom corner of the map while the low altitude areas covers the rest of the map. Altitude can play a role in transportation of pollutants within Sofia City. During summer period, wind will blow from low to high altitude area while the wind will blow from high to low altitude area during winter period. This explains why the pollution is more serious in quarter 1 and quarter 4. As the main pollutant producer area concentrate in the city central area, the pollutants will be able to spread to high altitude area during summer. However, during winter, the pollutants will be trapped within the low altitude area in city central. The wind flow from high altitude area might also carry other pollutants abroad to Sofia city.

Factor 4: Complex interactions between local topography and meteorological characteristics
Correlation MatrixFinal.png

A correlation matrix plot can show the relationships within the meteorological measures. Within the meteorological characteristics, factors such as Dew Point Temperature and Temperature are positively correlated. Another example would be Wind Speed and Temperature. The multicollinearity that exists within the meteorological characteristics makes it difficult to pinpoint the exact variable that affects air quality clearly. In addition, the altitude measure from local topography data will also play a part in influencing the meteorological characteristics. However, the dataset on meteorological data is insufficient to study the interactions with local topography detailedly.

Factor 5: Transboundary Pollution
Map-on-the-Bulgarian-coal-resources-energy-infrastructure-and-largest-power-plants.png

Transboundary pollution might be one cause for the poor air quality considering the rest of the thermal power plants that are operating in other cities in Bulgaria. Additional datasets should be studied to analyze the wind flow directions within Bulgarian and with its immediate neighbours.

Interactive Visualization

The interactive visualization can be accessed here: https://public.tableau.com/profile/yunyingkaelyn#!/vizhome/Spatio-temporalAnalysisofAirQualityinSofiaCity/FinalStory


References

Understanding the current issues of poor air quality in Bulgaria https://www.eea.europa.eu/publications/air-quality-in-europe-2018/at_download/file

Map on the Bulgarian Coal Resources and Energy Infrastructure https://www.researchgate.net/figure/Map-on-the-Bulgarian-coal-resources-energy-infrastructure-and-largest-power-plants_fig1_257941554

Coordinates of Sofia's Thermal Plants http://www.wikiwand.com/en/List_of_power_stations_in_Bulgaria#/Thermal

Comments