IS428 2017-18 T1 Assign Shi Xiaoyu
To be a Visual Detective: Mistford Sensor Data Analysis Dashboard
Contents
Problem & Objective
Mistford is a mid-size city, located to the southwest of a large nature preserve. The city has a small industrial area with four factories. Recently, there is a significant decrease on the number of nesting pairs of the Rose-Crested Blue Pipit, a popular local bird. It is speculated that the downfall of the Rose-Crested Blue Pipit may be related to noxious gases from the factories near the preserve.
With the passage of the Mistford Pact of 2010, the city and the preserve have adopted certain safeguards to help ensure the safety of the people, animals, and vegetation of the area. With the aim in mind, air sampling sensors have been placed near the town and in the preserve to monitor air quality. In total, there are nine sensors to collect information on several substances of potential concern, such as Appluimonia, Chlorodinine and etc.
By using the data given, I aim to implement data visualization techniques to better analyse data and conduct tasks below:
- Characterize sensors’ performance and detect unexpected behaviors
- Identify the chemicals detected by the sensor group and the chemical release pattern
- Determine which factories are responsible for which chemical release with explanation. Identify operation pattern of the factories
Data Preparation
There are three types of data provided, location data of the four factories and nine sensors, air sampling data from sensors and meteorological data (wind direction & wind speed) from a weather station in proximity to the factories and sensors. To prepare data before importing into Tableau, I implement the following solutions to cope with different data issues.
Problem&Solution
Problem #1 | Building Location Data of Factories and Sensors |
---|---|
Issue | Location data of the four factories and nine sensors are stored in the unstandardized format (word document). Tableau fails to load the data. |
Solution | Create a spreadsheet named Location Data. Put the factory location data and sensor location data shown as below. |
Problem #2 | Calculating Meteorological Data For Every Hour |
---|---|
Issue | Meteorological data is captured every 3 hours whereas sensor data is captured every hour. Hence, the meteorological information for some sensor data records are missing. In this case, wind direction and speed for certain time stamp cannot be shown in the chart. In addition, there is a blank line at row 460 in the meteorological data. |
Solution | Remove the blank row. Add one column named DC/Hour, which represents the average direction change per hour till next timestamp. E.g. the D3 stands for the direction change per hour from 4/1/16 0:00 to 4/1/16 3:00. Similarly, add another column named SC/Hour (Average Speed Change Per Hour). Set the last record of April, August and December to 0. The new added columns will be used in the solution 3 to calculate wind direction and wind speed for every hour.Note: Here is an assumption that the the meteorological data is changed at an average speed during every three hours. |
Problem #3 | Combining Sensor Reading and Meteorological Data in the Same File |
---|---|
Issue | Meteorological data and sensor data are in the separate files. It’s important to have the sensor readings with wind information at exact time stamp since we need tell which factory is responsible for the chemical release by using meteorological data of wind speed and direction. |
Solution | Move the meteorological data and sensor data into the same file named Consolidated Data and combine the data using the formulas shown as below. |
Final Dataset
- Location Data.xlsx (Sensor Location & Factory Location)
- Consolidated Data.xlsx (Sensor Reading & Meteorological Data)
Data Import & Configuration
Before importing location data and consolidated data into the Tableau, we need to tell Tableau the relationship between the two files. The common attributes in the two files are "ID" in the location data and "Monitor" in the consolidated data. Connect two data files as Excel and configure the relationship shown as below.
In addition, I created a calculated field named "Duplicate?" to indicate whether there is a duplicate on sensor reading at the specific time stamp. Its value is "True" when duplicate occurs or else "False". Given that large values may influence the analysis, i picked the min reading value by using another calculated field named "ReadingWithMin". I also converted wind direction in degrees to compass direction using a calculated field named "wind vector direction".
Visualization
The interactive visualization can be accessed here: https://public.tableau.com/views/Assignment1_466/IntroductionDashboard?:embed=y&:display_count=yes&publish=yes
Purpose
The visualization is designed based on the three objectives. Here are the breakdown of the proposed visualization:
- Introduction
- Sensor Work Performance
- Chemical Release Pattern
- Factory Operation Pattern
Introduction: the introduction page provides an overview of what problems we are going to cope with as well as the objectives of the visualization. It also introduces the purposes of the rest pages (Sensor Work Performance & Chemical Release Pattern & Factory Operation Pattern). From the introduction page, we can navigate to the sensor work performance page, chemical release pattern page and factory operation pattern page.
Sensor Work Performance: the sensor work performance page identifies the sensors' unexpected working behaviors. We can interact with the page by filtering the sensors. It displays what time when there are the missing reading and duplicate readings by sensor. From the page, we can also detect the outliers in the sensor reading data.
Chemical Release Pattern: the chemical release pattern page shows what chemicals are detected by the sensor group. We can select all detected chemicals or the specific chemical and view the chemical reading by day and by hour. From the page, we can find out what time and which chemical has a higher amount of emittion.
Factory Operation Pattern: the factory operation pattern page displays the locations of factories and sensors. It shows the wind direction and chemical reading of each sensor. We can play the time stamp by using the page function, from which we can see the changes of wind direction and chemical reading of each sensor by hour. With these, we can know the sensor mainly detect the chemical from which factory at certain time stamp and hence conclude which factory are responsible for which chemical release and their operation pattern.
Interactive Techniques & Type of Chart Used
Introduction |
---|
Interactive Technique
|
Sensor Work Performance |
---|
Interactive Technique
|
Types of Charts used When exploring the data given, i found out three unexpected sensor behaviors. To better compare and visualize the problems, there are three types of charts used.
The image below is a representative of the type of charts used. |
Chemicals Release Pattern |
---|
Interactive Technique
|
Types of Charts used When detecting which chemicals are detected as well as their release pattern, i use line chart type for two charts.
The image below is a representative of the type of charts used. |
Factories Operation Pattern |
---|
Interactive Technique
|
Types of Charts used When detecting which chemicals are detected as well as their release pattern, i use line chart type for two charts.
The image below is a representative of the type of charts used. |
Findings
Sensor Work Performance
Q1: Characterize sensors’ performance and detect unexpected behaviors
Serial | Sensor Work Performance |
---|---|
1 | Graph Description: below is the heat map for readings of all sensors. The y axis represents the day whereas the x axis represents month and hour. The white rectangles as circled in red reflected the missing reading of all sensors. Conclusion: all sensors' readings on 2/4/2016 0:00:00, 6/4/2016 0:00:00, 4/8/2016 0:00:00, 7/8/2016 0:00:00 and 2/12/2016 0:00:00 are missing. Hence, the sensors are inactivated and resting during the listed time stamp. |
2 | Graph Description: below are the heat maps for readings of each sensor. The y axis represents the day whereas the x axis represents month and hour. The white rectangles as circled in red reflected the missing reading of each sensor. Conclusion: except sensor 3, the rest of sensors did not work on 02/08/2016 0:00:00. Except Sensor 6, 7 and 8, the rest of sensors did not work on 07/12/2016 0:00:00 |
3 | Graph Description: below are the bar charts to check duplicate readings for each sensor. The y axis represents month, chemical and number of records from left to right whereas the top x axis represents day and bottom x axis represents hour. The orange bars as circled in red reflected the duplicate reading of each sensor. Conclusion: all sensors have duplicate readings for the chemical AGOC-3A. No duplicate readings for other three chemicals (Appluimonia, Chlorodinine and Methylosmolene). As shown in the graph below, sensor 1, 2, 7, 8 have fewer duplicates compared to sensor 3,4,5,6,9. |
4 | Graph Description: below are the line charts to check outlier reading for each sensor. The y axis represents readings without duplicates whereas the top x axis represent month and day and the bottom x axis represents hour. Conclusion: sensor 3's reading has been fluctuating wildly during the three months compared to the other sensors. It reflects the sensor 3 is not stable. Sensor 5 and 9's reading have more outliers over time. Sensor 4's reading has been shifted up with baseline over time. Sensor 1, 2, 7, 8 have fewer number of outlier reading than sensor 6, but they have more variation in the baseline. |
Chemical Detected & Release Pattern
Q2: Identify the chemicals detected by the sensor group and the chemical release pattern
Serial | Chemical Release Pattern |
---|---|
1 | Graph Description: below is line chart to show all chemicals release by hour. The y axis represents average readings without duplicate whereas the x axis represents hour. Conclusion: the green line stands for the release of Methylosmolene. As shown in the chart, the release of Methylosmolene is much higher than others during 23:00 - 06:00. For the rest of the time, its release is lower than others. |
2 | Graph Description: figure 1 shows the all chemical release by day and figure 2 shows the individual chemical release by day . The y axis represents readings without duplicates whereas the top x axis represent month and the bottom x axis represents day. Conclusion: from the figure 1, we can tell the release of Appluimonia is lower than others on average. From figure 2, we can tell that the release of all chemicals are increasing over time. |
Factory Operation Pattern
Q3: Determine which factories are responsible for which chemical release with explanation. Identify operation pattern of the factories. Note: Given that the unstable performance of sensor 3, the reading from sensor 3 will not be used in the following analysis.
Serial | Factory Operation Pattern |
---|---|
1 | As shown in the figure 3, although the wind direction is SW, the sensor 5 and 9 still detect the largest amount of AGOC-3A compared to other sensors. Therefore, together with figure 4, we can conclude Radiance ColourTeck is the main source of AGOC-3A at a high level. |
2 | In the figure 5, sensor 4 and 7 detect the largest amount of Chlorodinine compared to other sensors even if the wind direction is S. It indicates either of Roadrunner and Kasios is the main source of Chlorodinine. The figure 6 shows the sensor at the top detects the the major amount of the Chlorodinine. Hence, Roadrunner is the main source of Chlorodinine. After view the chart over time by clicking the play button in the page, we can see the release of Chlorodinine is at a low level. |
3 | As can be seen in the figure 7, sensor 7 detects the highest amount of Methylosmolene compared to other sensors when wind direction is W. Since Kasios is the nearest to sensor 7 relative to other factories. Therefore, Kasio is the main source of Methylomolene at a high level. |
4 | In the figure 8, when wind blows to the west, sensor 7 detects the highest amount of Appluimonia followed by sensor 5 and 9. After we compare the location of sensor 7,5,9 and the four factories, we can conclude that Indigo is the main source of Appluimonia After view the chart over time by clicking the play button in the page, we can see the release of Appluimonia is at a low level. |