IS428 2017-18 T1 Assign Zhang Qian

From Visual Analytics for Business Intelligence
Revision as of 21:17, 8 October 2017 by Qian.zhang.2014 (talk | contribs)
Jump to navigation Jump to search

Visual Analytics Assignment

Problem & Motivation

Assumptions

• Since the “Date time” in “Sensor Data” only have whole hours (E.g. “4/1/16 4:00”, “4/1/16 6:00”), it is assumed that the sensor detect chemicals on an hourly basis and each timestamp should have one record.

• Due to limited information given, it is assumed that missing record for a particular timestamp is considered abnormal behavior.

Questions

Q1

1) Characterize the sensors’ performance and operation.

As for sensors’ performance and operation, we characterize different sensors based on their noise.

For monitor 1,2,8, there are less noise. For monitor 1 and 2, the noise occurs along three months. However, for monitor 8, the noise only appears on April and December.

Zq-1.png


For monitor 5,6,9, there are more noise and the noise appear along three months.

Zq-2.png


For monitor 3, 7 there are mild level of noise.

Zq-3.png


For monitor 4, there is an obvious shift up of the baseline readings among April, August and December.

Zq-4.png

2) Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?

• Missing data points on Midnight

There are a lot of common missing data points founded at 0 o’clock for all sensors and for every chemical detection. The graph below shows for the total number of records reported from each monitor for a specific chemical at each hour on every day of that month. For example, the red circle indicates that for monitor 1, there are missing records for “Appluimonia” at 0 o’clock on April 2nd and April 7th.

Zq-5.png


• Duplicate data points for “AGO-3A”

For “AGO-3A” chemical, there are a lot of duplicate records and there are a larger than usual value and a smaller value. As I deep dive into these greater value, I found that these data points result in a dramatic increase in the sum of readings for that hour. For example, as indicated by the following graphs, when monitor 1 detecting “AGOC-3A”, there are two records at 14:00 August 2nd. When we turn to “Total readings per hour”, It is obvious the total readings at 14:00 August 2nd is much higher than the rest records. Thus, I assume the higher value of these duplicates data points is considered as outliers and should be removed.

Zq-6.png
Zq-7.png


• Missing data points for detecting “Methylosmolene”

I found that there are a lot of missing data points for chemical “Methylosmolene”. Although there are some common missing data in the midnight, but the record missing is more severe for “Methylosmolene” detection than it in other chemical detection.

Zq-8.png


• Exceptions

There are some data points in this dataset which is abnormally high. For example, in the figure below, the reading of “Chlorodinine” for monitor 9 at 22:00 on December 19th is 5.034 and much higher than usual while the rest data points for are all below 2.5.

Zq-9.png

Q2

Data Preparation

Since some unexpected behaviors of sensors have been identified in question 1, the first thing to do is to remove the outliers and clean the data. As mentioned in the question 1, duplicate data points only found in “AGOC-3A” detection and one of the reading number is often higher than usual. Thus, I assume that those duplicate records with higher reading results from abnormal sensor detection and should be removed. I use the following VBA program to remove these outliers. The total number of records are reduced from 79243 (excluding header line) to 79029.

Zq-10.png

1) Which chemicals are being detected by the sensor group?

I use pie charts to find out which chemicals are detected by which sensor group. As shown in the figure above, each individual pie chart represents a monitor and the size of the pie chart represents the total number of readings it has. So we can see that monitor 3 and monitor 4 has largest readings while monitor 1 has the least readings. For each monitor, it is divided into four parts, representing the percentage of the readings for “AGOC-3A”, “Appluimonia”, “Chlorodinine”, “Methylosmolene”. According to this graph, all sensors are detecting four chemicals. The percentage of each chemicals detected by each monitor (except monitor 6) are roughly the same (around 22% to 29%).

Zq-11.png

2) What patterns of chemical releases do you see, as being reported in the data?

Pattern 1: For each chemical, the release amount is increasing month over month. As shown in the figure below, the sum of reading for each chemical is increasing over the month. Also the release amount of “Methylosmolene” is always the largest, followed by “AGOC-3A”, “Chlorodinine” and “Appluimonia”.

Zq-12.png


Pattern 2: As shown in the figure below, the release of “Methylosmolene” follows a daily pattern. The detected amount of this chemical is greater during 10pm to 5am next morning. It has a dramatic decrease between 5 am and 6 am and a remarkable increase between 9pm and 10pm. While the release of the rest chemicals are not highly influence by different hours in a day.

Zq-13.png

Q3

Data Preparation

-Which factories are responsible for which chemical releases?

adiance-->Methylosmolene

As shown in the graph above, the diagrams “Q3 wind rose for each week” shows the wind direction and speed in week 14 (first week of April). At this point, I will focus on the readings of monitor 6, which has detected “Methylosmolene” in week 14. It is very obvious the wind direction is from north east side and the Factory 3 (Radiance) is at the northeastern corner of the monitor 6. In that case, I can conclude that Radiance are responsible for “Methylosmolene” emission.



Indigo-->Chlorodinine

As shown in the graph above, the diagrams “Q3 wind rose for each week” shows the wind direction and speed in week 53 (last week of December). At this point, I will focus on the readings of monitor 6, which has majorly detected “Chlorodinine” in week 53. It is very obvious the wind direction is from east side and the Factory 4 (Indigo) is at the eastern corner of the monitor 6. In that case, I can conclude that Indigo are responsible for “Chlorodinine” emission.

Reference

https://community.tableau.com/thread/209697