IS428 2017-18 T1 Assign Sarah Jane Tong

From Visual Analytics for Business Intelligence
Revision as of 20:52, 8 October 2017 by Sarahtong.2014 (talk | contribs)
Jump to navigation Jump to search

Problem and Motivation

Background

Rose-Crested Blue Pipits are dwindling in numbers, and the cause is likely to be from one or more of the four types of chemicals emitted from the four factories. The factories are located south-west of the nature preserve where the birds reside, with the 9 sensors placed in the vicinity surrounding the 4 factories.

The 4 factories are
  • Roadrunner Fitness Electronics
  • Kasios Office Furniture
  • Radiance ColourTek
  • Indigo Sol Boards
A brief description of the 4 chemicals are also as follows

1. Appluimonia – An airborne odor which can possibly cause serious injury, long-term health effects, or death to humans or animals.

2. Chlorodinine – Corrosive chemical which can attack and chemically destroy exposed body tissues. It has been used as a disinfectant and sterilizing agent as well as other uses. It is harmful if inhaled or swallowed.

3. Methylosmolene – Has potent effects on vertebrates. Liquid forms of Methylosmolene are required by law to be chemically neutralized before disposal. Strictly regulated.

4. AGOC-3A – A solvent which is not extremely harmful to human and environmental health.

General Task Overview

To determine the cause of the dwindling numbers of the birds. Isolating the chemicals from the dataset and identifying if there are correlations between the emissions and the data.

Subtasks
  1. Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?
  2. Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data?
  3. Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data.

Given the large number of variables and measures, there is a need to build an interactive data visualization tool to help to analyse the correlations between various variables. The visualisations will draw links between each factory, chemical and monitor.

Dataset Analysis & Transformation Process

There were 3 excel datasets provided in the assignment with different formats and attributes. And 1 document regarding general information about the environment in the problem question.

This section will elaborate on the dataset analysis and transformation process for each dataset in order to prepare the data for import and analysis on an interactive visualization.

Dataset Name Brief Description
Sensor Location.xlsx 9 sensors and their locations in X-Y coordinates in 2 columns.
Sensor Data.xlsx Approximately 80,000 rows each with a unique set of chemical, monitor, date and time and readings.

Date and time are taken in 3 months, April, August and December. One reading is taken 24 times a day, at the start of each hour for the entire month.

Meteorological Data.xlsx Approximately 2,000 rows with a unique time and date stamp, and respective wind direction and wind speed. Elevation is also given as 340 m.
MC2 Data Descriptions.doc Provided the X-Y coordinates of the factories and the sensors. Also provides information on the map, which is a 200 x 200 grid.

Data Cleaning

Meteorological Data

Problem
Column E has data on elevation which has no context in the questoin.
Solution
Elevation in Col E was removed
Elevation

Data Preparation for Merging/Joining Datasets

Several columns in the various datasets are likely to be merged. However, not all of them can be used for analysis as they are.

Datasets involved Data to change Image
Sensor Data.xlsx

Meteorological Data.xlsx

Datetime stamp for Meteorological data.

Interpolation such that both data reflect hourly data.

Sarah2.png
Sarah3.png
Sarah4.png
Meteorological Data.xlsx

MC2 Data Descriptions.doc

The units of the wind speed There are 3 units which can be used for wind speed:
  1. Metre per hr
  2. Miles per hour
  3. Map Coordinate Unit(px) per hour

Given that meteorological data was planned to be plotted on a x-y map later on, it makes most sense to convert from m/s to px/hr.

Meteorological Data To insert factory location into dataset.

The planned visualisation was such that each of the 4 factories and the wind would be plotted on the same map as a wind stick plot for each timestamp. Using the factory as the source.

First, 4 separate sheets were created, one for each factory. Using trigonometrical transformations, the wind was plotted with an XY coordinate.

For each time stamp there would be 2 sets of XY coordinates; the factory’s coordinates and the hypotenuse coordinate created by the angle.

<image>

Total 15.00