IS428 2017-18 T1 Assign Chew Jonathan Yee Long

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search

Assignment Details

IS428 Main Page: (https://wiki.smu.edu.sg/1718t1is428g1/Main_Page)

Assignment Overview: (https://wiki.smu.edu.sg/1718t1is428g1/Assignments)

Assignment Dropbox: (https://wiki.smu.edu.sg/1718t1is428g1/Assignment_Dropbox)

Overview

Mistford is a mid-size city is located to the southwest of a large nature preserve. The city has a small industrial area with four light-manufacturing endeavors. Mitch Vogel is a post-doc student studying ornithology at Mistford College and has been discovering signs that the number of nesting pairs of the Rose-Crested Blue Pipit, a popular local bird due to its attractive plumage and pleasant songs, is decreasing! The decrease is sufficiently significant that the Pangera Ornithology Conservation Society is sponsoring Mitch to undertake additional studies to identify the possible reasons. Mitch is gaining access to several datasets that may help him in his work, and he has asked you (and your colleagues) as experts in visual analytics to help him analyze these datasets.

Mitch Vogel was immediately suspicious of the noxious gases just pouring out of the smokestacks from the four manufacturing factories south of the nature preserve. He was almost certain that all of these companies are contributing to the downfall of the poor Rose-crested Blue Pipit bird. But when he talked to company representatives and workers, they all seem to be nice people and actually pretty respectful of the environment.

In fact, Mitch was surprised to learn that the factories had recently taken steps to make their processes more environmentally friendly, even though it raised their cost of production. Mitch discovered that the state government has been monitoring the gaseous effluents from the factories through a set of sensors, distributed around the factories, and set between the smokestacks, the city of Mistford and the nature preserve. The state has given Mitch access to their air sampler data, meteorological data, and locations map. Mitch is very good in Excel, but he knows that there are better tools for data discovery, and he knows that you are very clever at visual analytics and would be able to help perform an analysis.

The Task

The four factories in the industrial area are subjected to higher-than-usual environmental assessment, due to their proximity to both the city and the preserve. Gaseous effluent data from several sampling stations has been collected over several months, along with meteorological data (wind speed and direction), that could help Mitch understand what impact these factories may be having on the Rose-Crested Blue Pipit. These factories are supposed to be quite compliant with recent years’ environmental regulations, but Mitch has his doubts that the actual data has been closely reviewed. Could visual analytics help him understand the real situation?

The primary job for Mitch is to determine which (if any) of the factories may be contributing to the problems of the Rose-crested Blue Pipit. Often, air sampling analysis deals with a single chemical being emitted by a single factory. In this case, though, there are four factories, potentially each emitting four chemicals, being monitored by nine different sensors. Further, some chemicals being emitted are more hazardous than others. Your task, as supported by visual analytics that you apply, is to detangle the data to help Mitch determine where problems may be. Use visual analytics to analyze the available data and develop responses to the questions below.

  • Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?Limit your response to no more than 9 images and 1000 words.
  • Now turn your attention to the chemicals themselves. Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data? Limit your response to no more than 6 images and 500 words.
  • Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data. Limit your response to no more than 8 images and 1000 words.

Data Preparation

The Data

For this investigation, we were given 3 sets of excel sheets to work with:

  • Meteorological Data.xlsx
  • Sensor Data.xlsx
  • Sensor Location.xlsx

Data Cleaning

Issue: There was a lack of wind direction for each chemical reading, therefore lack of understanding where the possible source of pollution is coming from. Also in further visualization, in order to tag each reading and their direction, each reading should have time stamp tagged to it. Solution: To resolve this issue, I combined the wind data based on the time stamp via vlookup function in excel. Here I am assuming that each sensor reading data is tagged on a 3-hourly basis. For example, readings from 0000 hrs to 0200 hrs are tied to wind direction reading of 0000hrs, and readings from 0300 hrs to 0500 hrs are tied to wind direction reading of 0300hrs.

  1. I began resolving this by copying the either data set from Meteorological Data.xlsx to another separate sheet in Sensor Data.xlsx.
  2. I created 4 new columns of in the first sheet:
  1. Date Time (Wind)– allows for each reading to be tagged to a wind direction timestamp for further visualization
  2. Wind direction – Absolute value of each wind direction
  3. Pollution Concentration Direction – This is the direction in which the sensor is reading the chemicals from.
  4. Wind Direction (Rounded) – a rounded figure of wind direction to the nearest tenth for easier processing for visualization later.
  5. Wind Speed – Absolute Wind Speed of each reading.
  6. Columns of Wind Direction, Data Time (Wind), Wind Speed, reads data from the next sheet via a VLOOKUP function
  7. Columns of Pollution Concentration and Wind Direction (Rounded) will read the data from the columns above and be processed as below.

Sensor Data Combined with Wind Data

Wind Direction Model Concept

Issue: It is difficult to visualize the wind and monitor reading direction because of the restrictions that Tableau have with the scatter plot data. Even if we were to plot the location data on Tableau, it is difficult to show the wind direction since each point is not dynamic.

Solution: Based on this visualization example, https://community.tableau.com/message/196527#196527 , I discovered that to make such a visualization happen, I would need to create a large spreadsheet of wind direction data that enables to creation of polygons for each monitor.

Based on this visualization, it can be understood that based on the angels each polygon was created with these following sets of data, where there are 3 specific points and angles tagged to each point. This would mean that each meteorological reading will be duplicated into a total of 27 times, since each monitor would need a total 3 angles.

To do so, I would first have to duplicate the data by 26 times. I managed to find a VBA code that allows for such function via this link https://www.extendoffice.com/documents/excel/3682-excel-copy-and-insert-row-multiple-times.html.

• I copied the vba code and ran is as required. The following is the VBA code used for this duplication process.

   Sub insertrows()
       Dim I As Long
       Dim xCount As Integer
       xCount = Application.InputBox("Number of Rows", "Kutools for Excel", , , , , , 1)
       If xCount < 1 Then
           MsgBox "the entered number of rows is error ,please enter again", vbInformation, "Kutools for Excel"
           GoTo LableNumber
       End If
       For I = Range("A" & Rows.CountLarge).End(xlUp).Row To 1 Step -1
           Rows(I).Copy
           Rows(I).Resize(xCount).Insert
       Next
       Application.CutCopyMode = False
   End Sub

The following is the result as follows

Data Configuration

Visualization

Investigation Tasks

Concluding Findings

Credits

Comments