IS428 2017-18 T1 Assign Sarah Jane Tong
Contents
Problem and Motivation
Background
Rose-Crested Blue Pipits are dwindling in numbers, and the cause is likely to be from one or more of the four types of chemicals emitted from the four factories. The factories are located south-west of the nature preserve where the birds reside, with the 9 sensors placed in the vicinity surrounding the 4 factories.
- The 4 factories are
- Roadrunner Fitness Electronics
- Kasios Office Furniture
- Radiance ColourTek
- Indigo Sol Boards
- A brief description of the 4 chemicals are also as follows
1. Appluimonia – An airborne odor which can possibly cause serious injury, long-term health effects, or death to humans or animals.
2. Chlorodinine – Corrosive chemical which can attack and chemically destroy exposed body tissues. It has been used as a disinfectant and sterilizing agent as well as other uses. It is harmful if inhaled or swallowed.
3. Methylosmolene – Has potent effects on vertebrates. Liquid forms of Methylosmolene are required by law to be chemically neutralized before disposal. Strictly regulated.
4. AGOC-3A – A solvent which is not extremely harmful to human and environmental health.
General Task Overview
To determine the cause of the dwindling numbers of the birds. Isolating the chemicals from the dataset and identifying if there are correlations between the emissions and the data.
Subtasks
- Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?
- Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data?
- Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data.
Given the large number of variables and measures, there is a need to build an interactive data visualization tool to help to analyse the correlations between various variables. The visualisations will draw links between each factory, chemical and monitor.
Dataset Analysis & Transformation Process
There were 3 excel datasets provided in the assignment with different formats and attributes. And 1 document regarding general information about the environment in the problem question.
This section will elaborate on the dataset analysis and transformation process for each dataset in order to prepare the data for import and analysis on an interactive visualization.
Dataset Name | Brief Description |
---|---|
Sensor Location.xlsx | 9 sensors and their locations in X-Y coordinates in 2 columns. |
Sensor Data.xlsx | Approximately 80,000 rows each with a unique set of chemical, monitor, date and time and readings.
Date and time are taken in 3 months, April, August and December. One reading is taken 24 times a day, at the start of each hour for the entire month. |
Meteorological Data.xlsx | Approximately 2,000 rows with a unique time and date stamp, and respective wind direction and wind speed. Elevation is also given as 340 m. |
MC2 Data Descriptions.doc | Provided the X-Y coordinates of the factories and the sensors. Also provides information on the map, which is a 200 x 200 grid. |
Data Cleaning
Meteorological Data
- Problem
- Column E has data on elevation which has no context in the questoin.
- Solution
- Elevation in Col E was removed
Data Preparation for Merging/Joining Datasets
Several columns in the various datasets are likely to be merged. However, not all of them can be used for analysis as they are.
Datasets involved | Data Transformation | |
---|---|---|
Sensor Data.xlsx
Meteorological Data.xlsx |
Datetime stamp for Meteorological data.
Interpolation such that both data reflect hourly data. |
|
Meteorological Data.xlsx
MC2 Data Descriptions.doc |
The units of the wind speed | There are 3 units which can be used for wind speed:
Given that meteorological data was planned to be plotted on a x-y map later on, it makes most sense to convert from m/s to px/hr. |
Meteorological Data | To insert factory location into dataset.
The planned visualisation was such that each of the 4 factories and the wind would be plotted on the same map as a wind stick plot for each timestamp. Using the factory as the source. |
First, 4 separate sheets were created, one for each factory. Using trigonometrical transformations, the wind was plotted with an XY coordinate.
For each time stamp there would be 2 sets of XY coordinates; the factory’s coordinates and the hypotenuse coordinate created by the angle. <image> |
Question 1
Visualisation
The interactive visualisation can be seen here Q1 Visualisation