IS428 2017-18 T1 Assign Tan Yong Jin

From Visual Analytics for Business Intelligence
Revision as of 01:47, 8 October 2017 by Yongjin.tan.2014 (talk | contribs)
Jump to navigation Jump to search

Problem Overview

Link to the Assignment Information

Mistford is a mid-size city is located to the southwest of a large nature preserve. The city has a small industrial area with four light-manufacturing endeavors. Mitch Vogel is a post-doc student studying ornithology at Mistford College and has been discovering signs that the number of nesting pairs of the Rose-Crested Blue Pipit, a popular local bird due to its attractive plumage and pleasant songs, is decreasing! The decrease is sufficiently significant that the Pangera Ornithology Conservation Society is sponsoring Mitch to undertake additional studies to identify the possible reasons. Mitch is gaining access to several datasets that may help him in his work, and he has asked you (and your colleagues) as experts in visual analytics to help him analyze these datasets.

Mitch Vogel was immediately suspicious of the noxious gases just pouring out of the smokestacks from the four manufacturing factories south of the nature preserve. He was almost certain that all of these companies are contributing to the downfall of the poor Rose-crested Blue Pipit bird. But when he talked to company representatives and workers, they all seem to be nice people and actually pretty respectful of the environment.

In fact, Mitch was surprised to learn that the factories had recently taken steps to make their processes more environmentally friendly, even though it raised their cost of production. Mitch discovered that the state government has been monitoring the gaseous effluents from the factories through a set of sensors, distributed around the factories, and set between the smokestacks, the city of Mistford and the nature preserve. The state has given Mitch access to their air sampler data, meteorological data, and locations map. Mitch is very good in Excel, but he knows that there are better tools for data discovery, and he knows that you are very clever at visual analytics and would be able to help perform an analysis.

Background Information

With the passage of the Mistford Pact of 2010, the town and the Preserve have set into place certain safeguards to help ensure the safety of the people, animals, and vegetation of our area. When Mistford began growing its manufacturing industry, both the town and the companies wished to ensure an environmentally sound and economically supportive partnership. With these aims in mind, air sampling sensors have been placed near the town and in the Preserve to monitor air quality.

The chemicals

These sensors collect information on several substances of potential concern, including:

Appluimonia – An airborne odor is caused by a substance in the air that you can smell. Odors, or smells, can be either pleasant or unpleasant. In general, most substances that cause odors in the outdoor air are not at levels that can cause serious injury, long-term health effects, or death to humans or animals. However, odors may affect your quality of life and sense of well-being. Several odor-producing substances, including Appluimonia, are monitored under this program.

Chlorodinine – Corrosives are materials that can attack and chemically destroy exposed body tissues. Corrosives can also damage or even destroy metal. They begin to cause damage as soon as they touch the skin, eyes, respiratory tract, digestive tract, or the metal. They might be hazardous in other ways too, depending on the particular corrosive material. An example is the chemical Chlorodinine. It has been used as a disinfectant and sterilizing agent as well as other uses. It is harmful if inhaled or swallowed.

Methylosmolene – This is a trade name for a family of volatile organic solvents. After the publication of several studies documenting the toxic side effects of Methylosmolene in vertebrates, the chemical was strictly regulated in the manufacturing sector. Liquid forms of Methylosmolene are required by law to be chemically neutralized before disposal.

AGOC-3A – New environmental regulations, and consumer demand, have led to the development of low-VOC and zero-VOC solvents. Most manufacturers now use one or more low-VOC substances and Mistford’s plants have wholeheartedly signed on. These new solvents, including AGOC-3A, are less harmful to human and environmental health.

The Task

General task

The four factories in the industrial area are subjected to higher-than-usual environmental assessment, due to their proximity to both the city and the preserve. Gaseous effluent data from several sampling stations has been collected over several months, along with meteorological data (wind speed and direction), that could help Mitch understand what impact these factories may be having on the Rose-Crested Blue Pipit. These factories are supposed to be quite compliant with recent years’ environmental regulations, but Mitch has his doubts that the actual data has been closely reviewed. Could visual analytics help him understand the real situation?

The primary job for Mitch is to determine which (if any) of the factories may be contributing to the problems of the Rose-crested Blue Pipit. Often, air sampling analysis deals with a single chemical being emitted by a single factory. In this case, though, there are four factories, potentially each emitting four chemicals, being monitored by nine different sensors. Further, some chemicals being emitted are more hazardous than others. Your task, as supported by visual analytics that you apply, is to detangle the data to help Mitch determine where problems may be. Use visual analytics to analyze the available data and develop responses to the questions below.

The specific tasks

  • Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?
  • Now turn your attention to the chemicals themselves. Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data?
  • Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data.


Data Cleaning & Manipulation

Sensor Readings

Missing Data

As shown in Sensors Performance and Operations, there are missing data. Since the percentage of missing data within the provided data set is relatively small at 0.3% (245 missing out of 79488 records), there will be no attempts to estimate the readings for these missing data.

Meteorological Data

Elevation

Since there is only 1 entry for the column "Elevation", I believe it refers to the height of the Meteorological Station collecting the Meteorological Data. Since it has no further useful information, I have decided to remove it and not include it for further analysis.


Wind Speed

Since the wind speed is measured in meters per second and the provided map is measure in miles, the readings are also collected each hour. There is a need to convert the wind speed to a measurement unit easier to use, which is miles/hour.

From the conversion rate found from Google, the below image shows that 1 meter per second is equivalent to 2.23694 miles per hour. As such, in the Meteorological Data provided, I have created a new column (using JMP) showing the Wind Speed in Miles per Hour using the formula:


Wind Speed (Miles per Hour) = Wind Speed (Meters per Second) * 2.23694


The below image shows a snippet of the output of my Meteorological dataset after the Wind Speed conversion.


Converted Wind Speed Snippet.png

Sensor Location

Since the original data source does not include the coordinates of the factories, the information of these factories are added in and saved as a brand new data set.

The final output is shown below:

Snippet of the data which will be used to perform map plotting

This is used, in order to plot out the location of all the sensors and factories, without relying on the provided background map.

Data Joining

Results

Sensors’ Performance and Operation

  • Are they all working properly at all times?
  • Missing Data Each sensor records one reading for each of the 4 chemicals hourly for the months of April, August and December. In total, each sensor will have 8832 records in the data set (4 chemicals * 24 hours * 92 days). For all 9 sensors, there should be 79488 records in the provided data set. However, there are only 79243 records, this means that there are 245 records missing. From here, we can see that the sensors are not working properly as intended.
    Records broken down by the hour of the day which they are collected

    Furthermore, upon further analysis, all the 245 missing records are 12am records. The downtime would not be so coincidental at 12am(s) across different days, it could be due to external factors.

    This image shows the amount of records recorded across different days of the month at 12am

    The above image is the breakdown of data records collected by all the sensors, filtered to shown data recorded at 0000 hours only. Each day should consist of 12 records, 1 record per month per chemical for 3 months and 4 chemicals. An exception is day 31 as April has only 30 days. We can see that there are missing data during the 2nd, 4th, 6th and 7th during the 3 months at 12am.

    In summary, on these 4 days across the 3 months, only the dates where chemical readings are not fully collected on 12am are shown:

    Y – No readings recorded

    N – Partial readings (1 to 3 records) recorded

    Date\Sensor 1 2 3 4 5 6 7 8 9
    2nd Apr N N N N N N N N N
    6th Apr N N N N N N N N N
    2nd Aug N N Y N N N N N N
    4th Aug N N N N N N N N N
    7th Aug N N N N N N N N N
    2nd Dec N N N N N N N N N
    7th Dec N N N N N Y Y Y N


    Duplicate Data

    This image shows the number of records by each Chemical

    The above diagram shows the breakdown of records by each of the 4 Chemicals. We can see that the number of readings of Appluimonia and Chlorodinine are close to the average of 19180.75. However, for AGOC-3A and Methylosmolene, the number of records are far from the average value in opposite directions (AGOC-3A: +216.25, Methylosmolene: -213.75), since the difference for the 2 chemicals are quite similar, I suspect there could be Methylosmolene readings recorded wrongly as AGOC-3A.

    This image shows the number of records broken down by each chemical collect over a period of time

    The above chart confirms my suspicions. For Appluimonia and Chlorodinine, the number of records recorded are stable. However, AGOC-3A and Methylosmolene, the trend lines are a mirror reflection of each other, a decrease in data records for Methylosmolene will reflect an increase of data records in the same amount of AGOC-3A.

    This image shows the number of records and the relation to the reading levels of chemical AGOC-3A over the same period of time

    The above image shows that reading level spikes in AGOC-3A are mainly due to the wrongly recorded readings from Methylosmolene, as a pattern can be seen here between the Number of Records and the reading levels.

    As of why the sensors seem to not be able to fully accurately distinguish these 2 chemicals, it could be because they are similar in nature as solvents or properties. Thus, the sensor does not have the capability to distinguish these 2 chemicals apart at high concentration levels. Potential foul play could also be involved to hide Methylosmolene emissions as AGOC-3A, since AGOC-3A is a less harmful solvent as compared to Methylosmolene.


  • Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?

Chemicals

  • Which chemicals are being detected by the sensor group?
  • What patterns of chemical releases do you see, as being reported in the data?

Factories

  • Which factories are responsible for which chemical releases?
  • Roadrunner Fitness Electronics Kasios Office Furniture Radiance ColourTek Indigo Sol Boards
    AGOC-3A Example Example Example Example
    Appluimonia Example Example Example Example
    Chlorodinine Example Example Example Example
    Methylosmolene Example Example Example Example
  • Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data.
  • AGOC-3A Appluimonia Chlorodinine Methylosmolene

Interactive Data Visualization

References