IS428 2017-18 T1 Assign Lu Jianan

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search

Overview

Mistford is a mid-size city is located to the southwest of a large nature preserve. The city has a small industrial area with four light-manufacturing endeavors. Mitch Vogel is a post-doc student studying ornithology at Mistford College and has been discovering signs that the number of nesting pairs of the Rose-Crested Blue Pipit, a popular local bird due to its attractive plumage and pleasant songs, is decreasing! The decrease is sufficiently significant that the Pangera Ornithology Conservation Society is sponsoring Mitch to undertake additional studies to identify the possible reasons. Mitch gets permission for accessing to several datasets that may help him in his work, but he seeks an experts in visual analytics to help him analyze these datasets.

Mitch Vogel was immediately suspicious of the noxious gases just pouring out of the smokestacks from the four manufacturing factories south of the nature preserve. He got the conclusion that all of these companies are causing the downfall of the poor Rose-crested Blue Pipit bird. But when he talked to company representatives and workers, they all seem to be nice people and actually pretty respectful of the environment.

On the other hand, Mitch was surprised to learn that the factories had recently taken steps to make their processes more environmentally friendly, although it raised their cost of production. Mitch discovered that the state government has been monitoring the gaseous effluents from the factories through a set of sensors, distributed around the factories, and set between the smokestacks, the city of Mistford and the nature preserve. The state has allowed Mitch access to their air sampler data, meteorological data, and locations map. Mitch is very good in Excel, but he knows that there are better tools for data discovery, and he knows that expert in visual analytics could be able to help him to perform an analysis to understand the real situation better.

Task

General task

The four factories in the industrial area are subjected to higher-than-usual environmental assessment, due to their proximity to both the city and the preserve. Gaseous effluent data from several sampling stations has been collected over several months, along with meteorological data (wind speed and direction), that could help Mitch understand what impact these factories may be having on the Rose-Crested Blue Pipit. These factories are supposed to be quite compliant with recent years’ environmental regulations, but Mitch has his doubts that the actual data has been closely reviewed. Could visual analytics help him understand the real situation?

The primary job for Mitch is to determine which (if any) of the factories may be contributing to the problems of the Rose-crested Blue Pipit. Often, air sampling analysis deals with a single chemical being emitted by a single factory. In this case, though, there are four factories, potentially each emitting four chemicals, being monitored by nine different sensors. Further, some chemicals being emitted are more hazardous than others. Your task, as supported by visual analytics that you apply, is to detangle the data to help Mitch determine where problems may be. Use visual analytics to analyze the available data and develop responses to the questions below.

The specific tasks

  • Characterize the sensors' performance and operation. Perform a visual analysis to find the sensors which does not work properly. Analyzing the sensor's data by comparing the date, the normal data trend of the sensors. Pointing out the unexpected behaviors done by any sensors at which specific day and time.
  • visualize the sensor reading for all chemicals to find the pattern and trend and whether the chemicals are being detected by the sensor group.
  • find which factories are responsible for which chemical releases. Carefully describe how to determine this by using the visual graph. For the factories identified, describe any observed pattern of operation revealed in the data.


Data Building/Cleaning

Joining all data tables, removing the used column to perform a meaningful data table preparing for visual analysis.

Serial Observation Description
1
Inner Join sensor data and meterological data.jpg
Inner join data between Meteorological Data.xlsx and Sensor Data.xlsx by date
2
Inner Join sensor data location.jpg
Inner join data between Sensor Data.xlsx and Sensor Location.xlsx by monitor
3
Cleardata.jpg
Delete useless column and hide unused column "Elevation (m)" from Meteorological Data.xlsx, and also hide the duplication of common column monitor and date. This image shows the output table after 2 inner join 3 tables with clean data.
4
AddFactoryXYLoc.jpg
Add 4 factories X Y values to the sensor location.xlsx file

Solution

Task#1

Serial Observation Result
1
MissingData.jpg
Checking for all missing date from the sensor location, when checking the date, disconnect the relationship between sensor location.xlsx and sensor data.xlse first.
2
Wrong Input data recorded by Monitor 1 for all chemicals in 2016.jpg
The reading for Monitor 1 is significantly high on Dec 5 2016, the input value may has some errors.

List of missing data:

Date Monitors
2/4/2016 12:00:00 AM 1,2,3,4,5,6,7,8,9
6/4/2016 12:00:00 AM 1,2,3,4,5,6,7,8,9
2/8/2016 12:00:00 AM 1,2,4,5,6,7,8,9
4/8/2016 12:00:00 AM 1,2,3,4,5,6,7,8,9
7/8/2016 12:00:00 AM 1,2,3,4,5,6,7,8,9
2/12/2016 12:00:00 AM 1,2,3,4,5,6,7,8,9
7/12/2016 12:00:00 AM 1,2,3,4,5,9

Task#2

Serial Observation Description
1
TrendForReadingData.jpg
The reading for all the monitors has the similar trend, which is the pollution for all chemicals are slightly higher in December 2016 (Winter season) and lower in April 2016 (Spring season). While the chemical "AGOC-3A" has the highest reading among all chemicals. Besides, the worst reading are always record at 6am each day.
2
TrendForReadingData2.jpg
The reading for all the monitors has the similar trend, which is the pollution for all chemicals are slightly higher in December 2016 (Winter season) and lower in April 2016 (Spring season). While the chemical "Appluimonia" always has the lowest reading among all chemicals at 21pm each day.
3
TrendForReadingData3.jpg
The reading for all the monitors has the similar trend, which is the pollution for all chemicals are slightly higher in December 2016 (Winter season) and lower in April 2016 (Spring season). While the chemical "Chlorodinine" always has the lowest reading among all chemicals on 3am each day.
4
TrendForReadingData4.jpg
The reading for all the monitors has the similar trend, which is the pollution for all chemicals are slightly higher in December 2016 (Winter season) and lower in April 2016 (Spring season). While the chemical "Methylosmolene" always has the highest reading among all chemicals on 3am each day.

Conclusion, for the most of the time, there is trend that for all the sensors when the reading of chemical "AGOC-3A" raise, the reading of chemical "Appluimonia" decrease. On the other hand, when the reading of chemical "Methylosmolene" raise, the reading of chemical "Chlorodinine" decrease. Besides, there is no clear relation between wind speed and reading of chemicals or between wind direction and reading of chemicals. No matter high/low wind or different wind direction, there is no clear effect on the reading of chemicals.

Task#3

Serial Observation Description
1 Example Example
2 Example Example
3 Example Example

Visualisation Software Used

The software used for processing the visual analysis:

Software Name Software Logo Description
Tableau
Tableau.png
visualize data after processing the background data from excel datasets
QlikView
Qlikview.jpg
visualize data after the background data from excel datasets
Excel
Excel.jpg
check and modify the raw datasets to remove invalid input or wrong input

Submission details

This is an individual assignment. You are required to work on the assignment and prepare submission individually. Your completed assignment is due on 8th October 2017, by 11.59pm mid-night.

You need to edit your assignment in the appropriate wiki page of the Assignment Dropbox. The title of the wiki page should be in the form of: IS428_2017-18_T1_Assign_FullName.

The assignment wiki page should include the URL link to the web-based interactive data visualization system prepared.

Reference

IS428_2016-17_Term1_Assign3_Tan_Kee_Hock

Assignment Q&A

Through the whole assignment, my problem can be split into two parts, data building and type of visual graph to process a good visual analysis.

1. if there are too many wrong data or missing data inside the datasets, how to fix the datasets?

Fix the missing datasets by adding data reference to the trend, and modify the wrong data replace by the data close to the trend line as well.

2. the scale for map is very small and the map is too dark, the reading data is very hard to presented together with map. How to the visual analysis on the map which is a picture?