IS428 2017-18 T1 Assign Tam Kei Wen

From Visual Analytics for Business Intelligence
Revision as of 23:02, 8 October 2017 by Keiwen.tam.2015 (talk | contribs) (Draft 10)
Jump to navigation Jump to search

Assignment Details

IS428 Main Page: (https://wiki.smu.edu.sg/1718t1is428g1/Main_Page)

Assignment Overview: (https://wiki.smu.edu.sg/1718t1is428g1/Assignments)

Assignment Dropbox: (https://wiki.smu.edu.sg/1718t1is428g1/Assignment_Dropbox)

Problem & Motivation

Mistford is a mid-size city is located to the southwest of a large nature preserve. The city has a small industrial area with four light-manufacturing endeavors. Mitch Vogel is a post-doc student studying ornithology at Mistford College and has been discovering signs that the number of nesting pairs of the Rose-Crested Blue Pipit, a popular local bird due to its attractive plumage and pleasant songs, is decreasing! The decrease is sufficiently significant that the Pangera Ornithology Conservation Society is sponsoring Mitch to undertake additional studies to identify the possible reasons.

Mitch Vogel was immediately suspicious of the noxious gases just pouring out of the smokestacks from the four manufacturing factories south of the nature preserve. He was almost certain that all of these companies are contributing to the downfall of the poor Rose-crested Blue Pipit bird. But when he talked to company representatives and workers, they all seem to be nice people and actually pretty respectful of the environment.

In fact, Mitch was surprised to learn that the factories had recently taken steps to make their processes more environmentally friendly, even though it raised their cost of production.

Mitch is gaining access to several datasets that may help him in his work, and he has asked you (and your colleagues) as experts in visual analytics to help him analyse these datasets. These datasets includes air sampler data, meteorological data, and locations maps provided by the state government, which has been monitoring the gaseous effluents from the factories through a set of sensors distributed around the factories.

Task

General Task

The dataset provided several months of meteorological data (wind speed and direction) and chemical data emitted by four industrial factories and captured by nine sensing stations. To explore the spatio-temporal chemical readings and wind data, specifically which factories emitted what chemicals and how the nine sensors in the area were performing, the team developed a web-based analytics tool with interactive visualizations and path line analysis to reveal sensor errors and chemical reading spikes, as well as pinpoint possible sources of chemical reading spikes. The goal was to help the local ornithologist determine whether or not the factories were compliant with environmental regulations.

Specific Task

Specifically, we are to provide visualisation to identify these issues:

• Sensors: To find out if all sensors’ performance and operations are working properly at all times, by detecting unexpected behaviours of sensors from the readings captured. [Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviours of the sensors through analysing the readings they capture? Limit your response to no more than 9 images and 1000 words.]

• Chemicals: To find out which chemicals are being detected by the sensor group, by identifying patterns of chemical releases. [Now turn your attention to the chemicals themselves. Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data? Limit your response to no more than 6 images and 500 words.]

• Factories: To find out which factories are responsible for which chemical releases, to be able to pinpoint on the factories which are responsible for the Rose-Crested Blue Pipits. [Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data. Limit your response to no more than 8 images and 1000 words.]

Datasets Provided (Sensor Data, Sensor Location, Meteorological Data)

Additional Information Provided


  1. Sensor Data (Sensor Data.xlsx)
    • Contains three months of readings in the following format:
    • Chemical: Which one of the four chemicals detected by the sensors
    • Monitor: Which one of the nine sensors picking up the reading
    • Reading: The air sensor detected amount in parts per million
    • Date Time: The date and time of day of the reading, local time with no change for Daylight Savings

  2. Sensor Location (Sensor Location.xlsx)
    • The factories and sensor locations are provided in terms of x,y coordinates on a 200x200 grid, with (0,0) at the lower left hand corner (southwest). The sensors map shows the locations of the sensors and factories by number for the sensors and by name for the factories.

  3. Meteorological Data (Meteorological Data.xlsx)
    • Contains three months of readings in the following format:
    • Date: The date and time of the readings, local time with no change for Daylight Savings
    • Wind Direction: The compass directions where wind is originating from, using a north-referenced azimuth bearing where 360/000 is true north
    • Wind Speed: The speed of the wind in meters per second
    each of these reading is taken at the date and time provided

Data Cleaning & Preprocessing

Problems with the Data

1. Missing Values

Missing Values.jpg

Figure 1 - Sensor Activity

Analysis of the sensor data in the form of a heat map reveals missing data as indicated by blank squares. This turned out to be rather unusual, and seems to suggest that certain sensors were faulty and therefore unable to pick up readings. In order to further study the extensiveness of the missing values, missing rows have to be inserted into the data through the use of Java code (Figure 2).

Code.PNG

Figure 2 - Sensor Activity

The code shown in figure two inserts missing rows into the data, and also flags duplicate readings in each sensor.

2. Duplicate Values

Duplicate Values.jpg

Figure 3 - Duplicate Values

Upon further study of the sensor data, more missing values were picked up. These values are highlighted in red in figure 3. Coincidentally, there were no readings collected at certain specific points in time (2nd April, 6th April, 2nd August, 4th August, 7th August, 2nd December and 7th December at midnight) for every sensor. In the following hour at 1:00, sensor readings were restored. This could be due to the possible occurrence of intermittent disruptions which could affect the proper functioning of sensors. Weather may be one of the larger suspects on the list to have caused the sensors to cease, especially during midnight when it becomes much colder.

While some sensors did not function properly, other sensors picked up multiple readings of the same chemical. These values are highlighted in orange in figure 3. Interestingly, the sensors that picked up multiple/missing readings correspond to the exact same point in time. There are a number of possible explanations for this phenomenon:

1) There is a malicious attempt to hide certain chemical readings by disguising one chemical as another.

2) The sensors misread Methylosmolene as AGOC-3A due to the presence of similar compounds in the chemicals (since Methylosmolene and AGOC-3A are close substitutes of one another).

3) Mixtures of different chemical compounds mask the presence of certain chemicals.

Based on the current data, there is insufficient evidence to prove point 1. Points 2 and 3 appear more convincing as it could be due to the faulty sensors misreading chemical concentrations in the air. However it is unsafe to assign one of the duplicate values to the missing chemical as we do not know which of these duplicate values to choose from. If the wrong values were assigned based on our evidence-less assumptions, the preceding visualizations would suffer. Thus, no reassigning of chemicals were made to Sensor Data.xlsx. However, it is still important to keep in the back of our minds that there are missing and duplicate data when creating our visualizations.

3. Redundant Elevation Data

Elevation.PNG


To prevent Tableau from misinterpreting data, this column has been removed.

Results

1. Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?

Tsk1.jpg

Figure 5 - Chemical Readings of Different Sensors Over Time

Sensor Remarks
Sensor 1 Consistent, low readings across the months, with few extreme outliers.
Sensor 2 Consistent, mid range readings across the months
Sensor 3 Readings are rather consistent across months, but with wide standard deviations.
Sensor 4 Consistent increase in chemical concentration readings over time in a step-like fashion.
Sensor 5 Reading concentrations gradually increase over time
Sensor 6 Consistent, low readings across the months, but this time with a greater number of extreme outliers as compared to sensors 1 and 2.
Sensor 7 Readings display similar to that of Sensor 3, but with slightly lesser variation.
Sensor 8 Consistent, mid range readings across the months
Sensor 9 Consistent readings in the months of April and August, spike in chemical readings only in the month of December

Figure 5 allows us to rule out sensor 4 as faulty because because of the continuous increase in chemical concentration. Although this could be caused by a genuine change in the concentration of chemical in the atmosphere, it is not likely because its two neighboring sensors, 3 and 5 do not display the same trend.

Unsual Spikes in Chemicals.jpg

Figure 6 - Unusual Spikes in Chemical Data Across Monitors

Certain chemicals, particularly AGOC-3A and Methylosmolene, have spike in readings. Comparing the spike of chemical AGOC-3A across the different monitors, we notice that the spike occurs at regular intervals, but picked up by different sensors because of changes in wind direction. The spike levels tend to range between 50 and 100 units, which suggests that the pollutants could be accumulated and the released in a batch at set intervals throughout the months.

Wind Rose Chart.jpg

Figure 7 - Direction and Strength of Wind Across the Three Months

Throughout the three months, the winds were generally headed in the North-West/South-West direction, as indicated by the size of the wind rose chart. Thus sensors 4, 5 and 9 should not have picked up as much readings as the rest of the other sensors. However, figure 6 shows that sensor 5 recorded some of the highest readings for AGOC-3A. Sensor 9 also had several spikes for AGOC-3A and Methylosmolene which would have been rather unlikely. Thus, besides sensor 4, sensors 5 and 9 are also suspects of being faulty.

2. Now turn your attention to the chemicals themselves. Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data?

Average Readings Per Chemical.jpg

Figure 8 - Average Reading Per Chemical

Figure 8 shows the different chemical released over the three months, averaged on readings each day. Chlorodinine and Appluimonia were released in small amounts, whilst there were significant releases of both AGOC-3A and Methylosmolene. This evidence suggests that even though the companies have been taking steps to make their processes more environmentally friendly, their measures are insufficient to keep chemical concentrations at a safe level. The release of high concentrations of Methylosmolene is particularly of concern because of its toxic side effects in vertebrates and greater measure have to be taken to reduce its impact on the poor Rose-crested Blue Pipit Bird. It is also worth noting that Methylosmolene was released at odd timings between 11am to 4pm. This could also be an attempt for companies to perform unscrupulous pollution practices when people are least aware of it.

It is also rather peculiar that the concentrations of AGOC-3A only increase during daylight hours from 11am to 4pm. Perhaps companies are only employing their best practices during daylight hours to satisfy environmental policy requirements and switch to releasing Methylosmolene later in the day.

Tsk2.jpg

Figure 9 - Chemical Reading Across Days From a larger perspective, when viewing across days in a month, we see a new pattern. The release of chemical pollutants for AGOC-3A occurs during the middle of the month, and at the beginning and end of each month for Methylosmolene. A possible explanation for this is that regulatory officials might only perform their spot checks towards the middle of the month, forcing companies to cover their tracks and switch to a cleaner form of pollutant, AGOC-3A.


3. Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data.

We can find out who is responsible for each release of chemical through a wind rose chart. The dashboard is filtered by month and date, and the corresponding wind strengths and directions will be shown.

Radiance.PNG

Figure 10 - Radiance Analysis

On the 25th of April, the wind direction as indicated by the chart is heading strong in the North-West region. We see in the filtered table on the left that sensor 3 has picked up some considerable concentrations of AGOC-3A up to a reading of 5.23. We identified earlier that sensor 3 had was properly functioning and there were no problems with the sensor. We can therefore associate the release of AGOC-3A to Radiance company, as they are situated South East of sensor 3.

Roadrunner.jpg

Figure 11 - Roadrunner Analysis

On the 21st of August, the wind direction as indicated by the chart is heading strong in the South East region. There are no companies that are situated North-West of Roadrunner, therefore we eliminate the possibility combining chemicals from different companies so that we can identify the companies that emit the pollutants. We see in the filtered table on the left that sensor 6 had pretty low readings for all chemicals. However,we can tell that the general readings of Appluimonia and Chlorodinine are higher than that of Methylosmolene and AGOC-3A. From this analysis, we can tell that Roadrunner releases the chemicals of Appluimonia and Cholordinine, but in small amounts.

Indigo.PNG

Figure 12 - Indigo Analysis

On the 28th of December, the wind direction as indicated by the chart is heading strong in the West. Sensor 6 is directly west of Indigo Sol Boards and would pick up any chemical disposal. Based on the readings of the table, the low amount of pollutant levels in all four chemicals is a clear sign that Indigo Sol is implementing environmentally friendly procedures to save the Blue Pipit Birds.

Kaisos.png

Figure 13 - Kaisos Analysis

On the 7th of December, the wind direction as indicated by the chart is heading strong in the South West region. Immediately we see in the table on the left a high concentration of Methylosmolene at 21.42 on monitor 8 at 0:00 hours. This shows a strong correlation between the chemical release and its readings captured.

This is a summary table of the chemical that each company releases:

AGOC-3A Appluimonia Chlorodinine Methylosmolene
Kasios Office Furniture
Roadrunner Fitness Electronics
Radiance ColourTek
Indigo Sol Board

Interactive Visualisation

References

Comments