IS428 2017-18 T1 Assign He Xi

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search

Links

Overview

Background

Mistford is a mid-size city is located to the southwest of a large nature preserve. The city has a small industrial area with four light-manufacturing endeavors. Mitch Vogel is a post-doc student studying ornithology at Mistford College and has been discovering signs that the number of nesting pairs of the Rose-Crested Blue Pipit, a popular local bird due to its attractive plumage and pleasant songs, is decreasing! The decrease is sufficiently significant that the Pangera Ornithology Conservation Society is sponsoring Mitch to undertake additional studies to identify the possible reasons. Mitch is gaining access to several datasets that may help him in his work, and he has asked you (and your colleagues) as experts in visual analytics to help him analyze these datasets.

Mitch Vogel was immediately suspicious of the noxious gases just pouring out of the smokestacks from the four manufacturing factories south of the nature preserve. He was almost certain that all of these companies are contributing to the downfall of the poor Rose-crested Blue Pipit bird. But when he talked to company representatives and workers, they all seem to be nice people and actually pretty respectful of the environment.

In fact, Mitch was surprised to learn that the factories had recently taken steps to make their processes more environmentally friendly, even though it raised their cost of production. Mitch discovered that the state government has been monitoring the gaseous effluents from the factories through a set of sensors, distributed around the factories, and set between the smokestacks, the city of Mistford and the nature preserve. The state has given Mitch access to their air sampler data, meteorological data, and locations map. Mitch is very good in Excel, but he knows that there are better tools for data discovery, and he knows that you are very clever at visual analytics and would be able to help perform an analysis.


General Task

The four factories in the industrial area are subjected to higher-than-usual environmental assessment, due to their proximity to both the city and the preserve. Gaseous effluent data from several sampling stations has been collected over several months, along with meteorological data (wind speed and direction), that could help Mitch understand what impact these factories may be having on the Rose-Crested Blue Pipit. These factories are supposed to be quite compliant with recent years’ environmental regulations, but Mitch has his doubts that the actual data has been closely reviewed. Could visual analytics help him understand the real situation?

The primary job for Mitch is to determine which (if any) of the factories may be contributing to the problems of the Rose-crested Blue Pipit. Often, air sampling analysis deals with a single chemical being emitted by a single factory. In this case, though, there are four factories, potentially each emitting four chemicals, being monitored by nine different sensors. Further, some chemicals being emitted are more hazardous than others. Your task, as supported by visual analytics that you apply, is to detangle the data to help Mitch determine where problems may be. Use visual analytics to analyze the available data and develop responses to the questions below.

Data Preparation

Issue:

Combining Factory and Monitor location into a file

Factory locations are provided in the word document but not inside the Data, the factories location data and monitors location data are combine into a file call All Location.

The following are the factory locations:
- Roadrunner Fitness Electronics: 89,27
- Kasios Office Furniture: 90,21
- Radiance ColourTek: 109,26
- Indigo Sol Boards: 120,22

As the screen shot below, the factory data and monitor data are combine into one file.

HxLocation.png

Meteorological Data

Issue:

Extra column of - Elevation(m) causing null value in tableau

there is a elevations(m) column in the data set, and an unknown header F4

HxTableauEvelation.png


to solve the problem, remove the the elevation in meteorological data file.

HxElevation.png

The specific tasks

Task 1

Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?Limit your response to no more than 9 images and 1000 words.

First, to understand whether the sensors are working properly, I think we should look at the data in a holistic view first. From diagram 1 above, we can tell that there are dates with no data, and they are the following.

Hxdg1.png
Diagram 1


list of missing data:

Hxtb1.png

As the missing data only happen in 00:00 for all the dates, and happen on almost the same date each month, chances of either a system maintenance is going on, or monitor maintenance is going on. However, as I searching through the missing value, I observe some abnormal data, for example, on 5 Dec, 05:00 the reading was 0.64, but on the next our, it spikes up to 35.82. Let’s look into detail

Hxdg2.png
Diagram 2


The spike is referring to the sudden change in reading that is not a constantly increase without a time- series pattern. The 9 monitors can split into 2 categories, quiet and noisy. Quiet: monitors: 1, 2, 7, 8. These monitors have a little spike during the 3 months, and throughout the 3 months, the readings are constants during the normal period. Noisy monitors: 3, 4, 5, 6, 9 These monitors are noisy, have a lot of spikes during the 3 months, we find out the especially monitor 4, have a constant increase in the reading throughout 3 months. Base on reading from other monitor, we can infer that monitor 4 is not functioning correctly. As the graph use the sum of reading of the four chemicals, I was wondering whether the reading is cause by duplicate data. The following diagram explain whether there are duplicate data.

Hxdg3.png
Diagram 3


The diagram shows that only for chemical AGOC-3A, there are a lot of duplicate data. However, the spikes are not cause by the duplicate reading. We can conclude that the monitors sometimes might experience abnormal reading due to hardware or software failure, but It won’t affect the overall performance.

Task 2

Now turn your attention to the chemicals themselves. Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data? Limit your response to no more than 6 images and 500 words.


Let’s take a look into the chemicals. First, let’s examine the chemical patterns for each day.

Hxdg4.png
Diagram 4


The diagrams shows above, we can observe that for AGOC-3A. the reading goes up from 6am, drop half at 3pm, and eventually drop to lowest at 10pm. For appluimonia, the reading are pretty constant throughout the day. There is a sudden drop of chlorodyne at 3am, however it back to normal after 3. For methylosnolene, the trend is it go up from 10pm to next day 5am, we can assume that’s the working hour. And now, lets split the trends and see whether each month follows the trend mention above.

Hxdg5.png
Diagram 5


The diagram indeed follows the trend of diagram 4, although there are some variation but I think the data pattern are still pretty constant for each chemicals. Let’s take a further look into the data.

Hxdg6.png
Diagram 6


As we can see from the diagram above, each chemical have a small increase in average value for each month, we can assume that either there are chemical residual in the air are getting intense, or the performance of the monitor is getting bad due to aging.

Task 3

Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data. Limit your response to no more than 8 images and 1000 words.


To find out the how the monitors detect the chemicals, is good to use the wind direct and wind speed, as well as the location of each monitor, the readings of each chemical of each monitor per day. Let’s look at the wind speed

Hxwind.png
Diagram 7



to examine the how the win spread, i had picked a few days where the wind direction are pretty constant and the wind speed are relatively high. the dates are: April 2, wind direction towards northwest April 20, wind direct towards north August 21, wind direct towards south December 9, wind direct towards west

the next step, select the respective date and the chemical readings of each monitor.
April 2:

Hxapr2.png
April 2


April 20:

Hxapr20.png
April 20



August 21:

Hxaug21.png
August 21


December 9:

Hxdec9.png
December 9


Overall observation: when the wind blow towards west to north, the monitors pick up readings when win direction is north, methylosmolene and AGOC-3A increase. the monitors that pick up the readings ( 2,3,4) when wind direction is northwest, methylosmolene and AGOC-3A increase.(1,2,3) when win direction is west, methylosmolene and AGOC-3A increase.(1,2,3,4) when win direction is south, overall chlorodinine and appluimonia increase.

lets look at the extreme case on April 2, monitor 6 detect a great reading of methylosmolene, as monitor 6 is in the middle of all factory, i believe that indigo and kasios might be responsible for this. on December 9, monitor 6 detect a great reading of AGOC-3A, i believe that radiance and indigo might be responsible for this. on August 21, the overall level of chlorodine and applumonia increase, i believe that roadrunner and indigo might responsible for it.

on summary


Hxs1.png


however, is not possible that indigo responsible for every single chemical release, after further look into the other dates with different direction the following conclusion had been made

Hxs2.png