Difference between revisions of "Lw-preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(4 intermediate revisions by the same user not shown)
Line 11: Line 11:
 
<!--header panel-->
 
<!--header panel-->
 
{|style='background:#FFFFFF; cellspacing=0; cellpadding=0; valign=center; border=0' width=100%|
 
{|style='background:#FFFFFF; cellspacing=0; cellpadding=0; valign=center; border=0' width=100%|
|style='background:#000000; text-align:center' width=25%|
+
|style='background:#1976D2; text-align:center' width=25%|
 
[[ISSS608_2016-17_T3_Assign_HUANG_LIWEI|<font color="#FFFFFF">Introduction</font>]]
 
[[ISSS608_2016-17_T3_Assign_HUANG_LIWEI|<font color="#FFFFFF">Introduction</font>]]
 
|style='background:#1976D2; text-align:center' width=25%|
 
|style='background:#1976D2; text-align:center' width=25%|
 
[[lw-questions|<font color="#FFFFFF">Questions and Reports</font>]]
 
[[lw-questions|<font color="#FFFFFF">Questions and Reports</font>]]
|style='background:#1976D2; text-align:center' width=25%|
+
|style='background:#000000; text-align:center' width=25%|
 
[[lw-preparation|<font color="#FFFFFF">Data Preparations</font>]]
 
[[lw-preparation|<font color="#FFFFFF">Data Preparations</font>]]
 
|style='background:#1976D2; text-align:center' width=25%|
 
|style='background:#1976D2; text-align:center' width=25%|
Line 22: Line 22:
 
<br>
 
<br>
  
== Overview ==
+
== Data Preparations ==
<table><tr>
+
=== Modifications on Source Table ===
<td align='justify'>
+
* Create an excel file for recording coordinates of monitors/factories.<br>
Mistford is a mid-size city is located to the southwest of a large nature preserve. The city has a small industrial area with four light-manufacturing endeavors.  Mitch Vogel is a post-doc student studying ornithology at Mistford College and has been discovering signs that the number of nesting pairs of the Rose-Crested Blue Pipit, a popular local bird due to its attractive plumage and pleasant songs, is decreasing! The decrease is sufficiently significant that the Pangera Ornithology Conservation Society is sponsoring Mitch to undertake additional studies to identify the possible reasons. Mitch is gaining access to several datasets that may help him in his work, and he has asked you (and your colleagues) as experts in visual analytics to help him analyze these datasets.<br>
+
Set an additional row for max values of x and y as 200.<br>
The four factories in the industrial area are subjected to higher-than-usual environmental assessment, due to their proximity to both the city and the preserve. Gaseous effluent data from several sampling stations has been collected over several months, along with meteorological data (wind speed and direction), that could help Mitch understand what impact these factories may be having on the Rose-Crested Blue Pipit. These factories are supposed to be quite compliant with recent years’ environmental regulations, but Mitch has his doubts that the actual data has been closely reviewed. Could visual analytics help him understand the real situation?<br>
+
[[File:Lw-Coordinates.png]]
Please visit [http://vacommunity.org/VAST+Challenge+2017 VAST Challenge 2017] for more details.
+
* Modify "Meteorological Data.xlsx", create two new columns for order index, 'index' is for overall order, 'time order' is for representing the order of each month.<br>
</td>
+
[[File:Lw-Index.png]]
<td valign='top'>&nbsp;&nbsp;[[File:Factory-lw.jpg|500px]]</td>
+
* Create another sheet in "Meteorological Data.xlsx", calculate the x-y coordinates based on wind directions & speed.
</tr></table>
+
The formular of x coordinates is:<br>
== Task ==
+
'''''WindSpeed'''*SIN(RADIANS('''WindDirection'''))*(-1)+''''TimeOrder'''''<br>
The primary job is to determine which (if any) of the factories may be contributing to the problems of the Rose-crested Blue Pipit. Often, air sampling analysis deals with a single chemical being emitted by a single factory. In this case, though, there are four factories, potentially each emitting four chemicals, being monitored by nine different sensors. Further, some chemicals being emitted are more hazardous than others. Your task, as supported by visual analytics that you apply, is to detangle the data to help Mitch determine where problems may be. Use visual analytics to analyze the available data and develop responses to the questions below. In addition, prepare a video that shows how you used visual analytics to solve this challenge. Novel visualizations and analysis approaches are especially interesting for this mini-challenge. Please do not use any other data in your work (including other Internet-based sources or other mini-challenge data).<br>
+
The formular of y coordinates is:<br>
Please visit [http://vacommunity.org/VAST+Challenge+2017+MC2 Mini Challenge 2] for more details.
+
'''''WindSpeed'''*COS(RADIANS('''WindDirection'''))*(-1)''<br>
== Questions and Solutions ==
+
Append a set of same-length values representing the original points of x-y coordinates. x is as same as the 'time order', y is 0.<br>
=== Question 1 ===
+
Add a new column as index for matching with the index in the former sheet.<br>
'''Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?Limit your response to no more than 9 images and 1000 words.'''
+
<br>
<br><br>
+
=== Data Import ===
<table width=100% border=1>
+
* Table join.
<tr><td width=60% align='center'>[[File:Trellis-Reading.png|500px]]</td>
+
Open tableau. Import file Sensor Data, Coordinates and Meteorological Data.<br>
<td align='justify'>
+
Full outer join the tables based on monitor no. (Sensor Data & Coordinates) and date time (Sensor Data & Meteorological Data) respectively.<br>
The sensors may not working properly at all times as there are some missing records for particular sensors/chemicals/time periods, which implies that there might be some issue happening during those time points.<br>
+
[[File:Lw-Join.png]]
For instance, the graph shows the daily reading records of monitor 4 for chemical Methylosmolene in April 2016, revealing several record gaps on day 12th ,17th and 22nd.
+
* New Datasource.
</td></tr>
+
Add new datasource, join two sheets within Meteorological Data (wind & direction) together, inner join with index.<br>
<tr><td width=60% align='center'>[[File:Trellis-Count.png|500px]]</td>
+
[[File:Lw-Join2.png]]
<td align='justify'>
+
<br>
Theoretically, each sensor should have reading records in every hour within the monitored month, if we group the records by the hour of day, the count of records in each hour should be the same as the number of days in that month. For example, there are 30 days in April, the count of records by hour of day in April should be 30 as well.<br>
+
=== Trellis Graph (Lines) ===
In the graph, we set 30 as the benchmark of record count, if the count is less than 30, there will be represent as a dipped red bar; if the count is more than 30, there will be a spiked green bar instead.<br>
+
* Create new calculated fields as row & column divider.<br>
The finding is that for chemical AGOC-3A and Methylosmolene, sensors 3-9 all show some inversion change of counts, which means in some time points, the sensor would likley have additional readings for AGOC-3A and corresponding reading outages for Methylosmolene, which also proves that the sensor may have some issues during the operation such as confounding the chemicals occasionally.
+
The formular of row divider:<br>
</td></tr></table>
+
''int((index()-1)/(round(sqrt(size()))))''<br>
 +
The formular of column divider:<br>
 +
''(index()-1)%(round(sqrt(size())))''<br>
 +
* Set filters.
 +
Set month, chemical, monitor as filters with single value.<br>
 +
[[File:Lw-Filters.png]]
 +
* Plot graph.
 +
Set hour (discrete) as columns, readings as rows with unaggreagated measures.<br>
 +
Put column divider and row divider into columns and rows respectively, both computing using day of the month.<br>
 +
Put day of the month as detail. Set marks as line.<br>
 +
Supply the information tooltips with related attributes.<br>
 +
[[File:Lw-Trellies-made.png]]
 +
<br>
 +
=== Trellis Graph (Counts) ===
 +
* Create new calucated field as count deviation.<br>
 +
The formula is:<br>
 +
''sum([Number of Records])-max([Day of Month])''<br>
 +
* Set month, chemical as filters.
 +
Only keep chemical AGOC-3A and Methylosmolene since we only want to compare the record counts of this pair.<br>
 +
* Plot graph.
 +
Set hour (discrete) as columns, set chemicals and count deviation as rows with aggreagated measures.<br>
 +
Put column divider and row divider into columns and rows respectively, both computing using monitor.<br>
 +
Put monitor as detail. Set marks as bar.<br>
 +
Add labels for showing the monitor names in each cell.<br>
 +
Supply the information tooltips with related attributes.<br>
 +
[[File:Lw-Trellies-made2.png]]
 +
<br>
 +
=== Calendar Graph (Days) ===
 +
* Set filters.
 +
Set chemical as filters with single value.<br>
 +
* Set columns and rows.
 +
Put weekday into columns, put month, week into rows.<br>
 +
* Set Other attributes.
 +
Set average reading as color, the darker the higher.<br>
 +
Set day of the month as labels so that each cell is labelled with day number accordingly.<br>
 +
[[File:Lw-Calendar-made.png]]
 +
<br>
 +
=== Line Chart ===
 +
* Set filters.
 +
Set month, day as filters. Month is single value and day can be multiple values.<br>
 +
* Plot graph.
 +
Set day and hour as columns, monitor and reading as rows with unaggregated measures.<br>
 +
Set chemical as color.<br>
 +
[[File:Lw-Line-made.png]]
 +
<br>
 +
=== Map ===
 +
* Set max(x) and max(y) as columns and rows respectively.
 +
So that the graph is 200*200 units.<br>
 +
* Set marks.
 +
Set monitor/factory no. as details and labels. Set type as color.<br>
 +
* Embed map picture.
 +
Embed map picture as background, lock the field as 0-200 for both x and y.<br>
 +
[[File:Lw-Map-made.png]]
 +
We will have the final graph:<br>
 +
[[File:Lw-Map-made2.png]]
 
<br>
 
<br>
=== Question 2 ===
+
=== Horizontal Graph ===
'''Now turn your attention to the chemicals themselves. Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data? Limit your response to no more than 6 images and 500 words.'''
+
* Create paramter.
<br><br>
+
Create band size as paramter do that we can tune the band size to shifting granularity.<br>
<table width=100% border=1>
+
[[File:Lw-Horizontal-made.png]]
<tr><td width=60% align='center'>[[File:Calendar-Days1.png|500px]]<br>[[File:Calendar-Days2.png|500px]]</td>
+
* Create measurement.
<td align='justify'>
+
Create calculated field as different of speed, calculating the percentage different of wind speed based on median speed for each record.<br>
Some of the chemicals may have a significant increasing in releases as the daily average readings through the days have formed a obvious gradiant from April to December.<br>
+
The formular is:<br>
The two graphs indicate that the releases of chemical Appluimonia and Chlorodinine are in the trend of growing from April to December.
+
(avg([Wind Speed (m/s)])/[Speed (window median)])-1
</td></tr>
+
* Create horizontal bands.
<tr><td width=60% align='center'>[[File:Calendar-Hours.png|500px]]<br>[[File:Calendar-Hours2.png|500px]]</td>
+
Create calculated fields as differenct bands of speed differences.<br>
<td align='justify'>
+
The formular for band is like (take example of +1 band):<br>
The releases of chemicals may also have some patterns in a day.<br>
+
''if (([Speed (% of diff)]>=0) and ([Speed (% of diff)]<=[Horizon - band size]))''<br>
The graphs use the monthly average readings of each chemical as benchmark, then present the percentage of deviation from the benchmark in every hour of the day (average readings). It is evident that chemical AGOC-3A is likely to be released from 6am to 9pm everyday as the readings are much higher beyond the average in that period, on the contrary, chemical Methylosmolene might be released during the late night 10pm to 5am.
+
''then ([Speed (% of diff)]/[Horizon - band size])''<br>
</td></tr></table>
+
''elseif ([Speed (% of diff)]>[Horizon - band size])''<br>
 +
''then 1 else 0 end''<br>
 +
The formular of the following band is like (+2 band):<br>
 +
''if (([Speed (% of diff)]>=[Horizon - band size]) and ([Speed (% of diff)]<=2*[Horizon - band size]))''<br>
 +
''then (([Speed (% of diff)]-[Horizon - band size])/[Horizon - band size])''<br>
 +
''elseif ([Speed (% of diff)]>2*[Horizon - band size])''<br>
 +
''then 1 else 0 end''<br>
 +
So on and so forth.<br>
 +
* Plot graph.
 +
Set month as filter with single value.<br>
 +
Set hour of day as columns, day of month as rows. Put measure values into rows, only retain those horizontal bands.<br>
 +
Set differenct of speed as detail, set marks as area, set measure names as color with gradiant colors from the lowest band (negative) to the highest band (positive).<br>
 +
[[File:Lw-Horizontal-made2.png|800px]]
 
<br>
 
<br>
=== Question 3 ===
+
=== Wind Stick Graph ===
'''Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data. Limit your response to no more than 8 images and 1000 words.'''
+
* Use the calculated sheet on 'Meteorological Data.xlsx'.
<br><br>
+
* Set filters.
<center>[[File:Lw-Map.png|500px]][[File:Lw-Map2.png|500px]]</center>
+
Set month, day as filters.<br>
<center>''This map graph shows the locations of all monitors and factories.''</center>
+
* Plot graph.
<table width=100% border=1>
+
Set x, y as columns and rows respectively. Set day of the month as columns in front of x.<br>
<tr><td width=60% align='center'>[[File:Horizontal-Wind-lw.png|500px]]</td>
+
Set index as detail, set hour of the day as color, set mark as line.<br>
<td align='justify'>
+
Supply the information tooltips with related attributes.<br>
The wind speed fluctuates though the hours. As the severe wind would impact the accuracy of the sensor, the days and hours with the mild wind would likely provide relatively convincible readings. As for instance, 14th to 18th of April would be appropriate for detecting the chemical release with sensor readings.
+
[[File:Lw-Wind Stick-made.png]]
</td></tr>
 
<tr><td colspan=2 align='center'>[[File:Line-Monitors-lw.png|1000px]]</td></tr>
 
<tr><td colspan=2 align='justify'>
 
With a line chart focusing on the targeted period, we may have a close look of the sensor readings. It is staightforward to detect the group spikes of chemical AGOC-3A among monitors 3-6 in 17th April. Our next step is to find out which factory may be the main reason that causes the spikes with the reference of wind direction.
 
</td></tr>
 
<tr><td colspan=2 align='center'>[[File:Lw-Wind Stick.png|1000px]]</td></tr>
 
</table>
 
 
<br>
 
<br>
== Comments ==
+
<br>
 
+
'''Data preparations are all done.'''
'''Comment 1:'''
 
 
 
Overall, very interesting analysis and easy to understand visuals.
 
 
 
Please explore if you can improve on both aesthetics and clarity by having larger graphs for 2nd and last figures. Do consider light shades of grey for your non-graphic portions.
 
 
 
From: Chua Gim Hong
 
 
 
 
 
'''Comment 2:'''
 
 
 
Well-done!  I like your representation of the wind analysis in Qn 3.  It is very clear to view the wind speed and direction at the hourly basis. That said, a minor point regarding the y-axis of the chart.  Would it be possible to add the y-axis label (scale), so that we can view the wind speed?  For now, we can only view the relative wind speed on the chart (i.e. hourly comparison) but not the actual wind speed. 
 
 
 
Also, for the map graph in Q3, it would be clearer if a larger size image can be displayed on the wiki-page :)
 
 
 
From: Ngo Siew Hui
 
 
 
'''Comment 3:'''
 
 
 
Hi Liwei,
 
 
 
The way you representing the wind speed and direction is very creative!
 
But I'm not clear about the meaning of colours, and how you derive factory responsibility.
 
It would be very much appreciated if you could explain more about it in your report.
 
 
 
Thanks,
 
Jiaqi
 

Latest revision as of 01:22, 17 July 2017

Lw-VAST.jpg VAST Challenge 2017 - Mini Challenge 2

Presented by: HUANG Liwei  (liwei.huang.2016@mitb.smu.edu.sg)
Visualization tool: Tableau

Introduction

Questions and Reports

Data Preparations

Comments


Data Preparations

Modifications on Source Table

  • Create an excel file for recording coordinates of monitors/factories.

Set an additional row for max values of x and y as 200.
Lw-Coordinates.png

  • Modify "Meteorological Data.xlsx", create two new columns for order index, 'index' is for overall order, 'time order' is for representing the order of each month.

Lw-Index.png

  • Create another sheet in "Meteorological Data.xlsx", calculate the x-y coordinates based on wind directions & speed.

The formular of x coordinates is:
WindSpeed*SIN(RADIANS(WindDirection))*(-1)+'TimeOrder
The formular of y coordinates is:
WindSpeed*COS(RADIANS(WindDirection))*(-1)
Append a set of same-length values representing the original points of x-y coordinates. x is as same as the 'time order', y is 0.
Add a new column as index for matching with the index in the former sheet.

Data Import

  • Table join.

Open tableau. Import file Sensor Data, Coordinates and Meteorological Data.
Full outer join the tables based on monitor no. (Sensor Data & Coordinates) and date time (Sensor Data & Meteorological Data) respectively.
Lw-Join.png

  • New Datasource.

Add new datasource, join two sheets within Meteorological Data (wind & direction) together, inner join with index.
Lw-Join2.png

Trellis Graph (Lines)

  • Create new calculated fields as row & column divider.

The formular of row divider:
int((index()-1)/(round(sqrt(size()))))
The formular of column divider:
(index()-1)%(round(sqrt(size())))

  • Set filters.

Set month, chemical, monitor as filters with single value.
Lw-Filters.png

  • Plot graph.

Set hour (discrete) as columns, readings as rows with unaggreagated measures.
Put column divider and row divider into columns and rows respectively, both computing using day of the month.
Put day of the month as detail. Set marks as line.
Supply the information tooltips with related attributes.
Lw-Trellies-made.png

Trellis Graph (Counts)

  • Create new calucated field as count deviation.

The formula is:
sum([Number of Records])-max([Day of Month])

  • Set month, chemical as filters.

Only keep chemical AGOC-3A and Methylosmolene since we only want to compare the record counts of this pair.

  • Plot graph.

Set hour (discrete) as columns, set chemicals and count deviation as rows with aggreagated measures.
Put column divider and row divider into columns and rows respectively, both computing using monitor.
Put monitor as detail. Set marks as bar.
Add labels for showing the monitor names in each cell.
Supply the information tooltips with related attributes.
Lw-Trellies-made2.png

Calendar Graph (Days)

  • Set filters.

Set chemical as filters with single value.

  • Set columns and rows.

Put weekday into columns, put month, week into rows.

  • Set Other attributes.

Set average reading as color, the darker the higher.
Set day of the month as labels so that each cell is labelled with day number accordingly.
Lw-Calendar-made.png

Line Chart

  • Set filters.

Set month, day as filters. Month is single value and day can be multiple values.

  • Plot graph.

Set day and hour as columns, monitor and reading as rows with unaggregated measures.
Set chemical as color.
Lw-Line-made.png

Map

  • Set max(x) and max(y) as columns and rows respectively.

So that the graph is 200*200 units.

  • Set marks.

Set monitor/factory no. as details and labels. Set type as color.

  • Embed map picture.

Embed map picture as background, lock the field as 0-200 for both x and y.
Lw-Map-made.png We will have the final graph:
Lw-Map-made2.png

Horizontal Graph

  • Create paramter.

Create band size as paramter do that we can tune the band size to shifting granularity.
Lw-Horizontal-made.png

  • Create measurement.

Create calculated field as different of speed, calculating the percentage different of wind speed based on median speed for each record.
The formular is:
(avg([Wind Speed (m/s)])/[Speed (window median)])-1

  • Create horizontal bands.

Create calculated fields as differenct bands of speed differences.
The formular for band is like (take example of +1 band):
if (([Speed (% of diff)]>=0) and ([Speed (% of diff)]<=[Horizon - band size]))
then ([Speed (% of diff)]/[Horizon - band size])
elseif ([Speed (% of diff)]>[Horizon - band size])
then 1 else 0 end
The formular of the following band is like (+2 band):
if (([Speed (% of diff)]>=[Horizon - band size]) and ([Speed (% of diff)]<=2*[Horizon - band size]))
then (([Speed (% of diff)]-[Horizon - band size])/[Horizon - band size])
elseif ([Speed (% of diff)]>2*[Horizon - band size])
then 1 else 0 end
So on and so forth.

  • Plot graph.

Set month as filter with single value.
Set hour of day as columns, day of month as rows. Put measure values into rows, only retain those horizontal bands.
Set differenct of speed as detail, set marks as area, set measure names as color with gradiant colors from the lowest band (negative) to the highest band (positive).
Lw-Horizontal-made2.png

Wind Stick Graph

  • Use the calculated sheet on 'Meteorological Data.xlsx'.
  • Set filters.

Set month, day as filters.

  • Plot graph.

Set x, y as columns and rows respectively. Set day of the month as columns in front of x.
Set index as detail, set hour of the day as color, set mark as line.
Supply the information tooltips with related attributes.
Lw-Wind Stick-made.png

Data preparations are all done.