Difference between revisions of "ISS608 2017-18 T3 G2 Assign ChenYanchong Visualization"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(14 intermediate revisions by the same user not shown)
Line 1: Line 1:
<div style="background:#DBE4EA;border:#A3BFB1; padding:15px;border-radius: 5px;margin-bottom:5px">
+
<div style="display:flex;background:#DBE4EA;border:#A3BFB1; padding:15px;border-radius: 5px;margin-bottom:5px;font-family:Century Gothic;font-weight:bold;">
[[Image:Cyc nav pic.jpg|300px]]  
+
<div>[[Image:Cyc nav pic.jpg|300px]]</div>
<b><font size = 6; color="#0277BD"> <span style="font-family:Century Gothic;">ISSS608 Visual Analytics Vast Challenge Mini Challenge 2</span> </font></b>
+
<div style="display:flex;align-items:center;padding:0 5px"><font size = 6; color="#0277BD" style="text-align:center;">ISSS608 Visual Analytics Vast Challenge Mini Challenge 2</font></div>
 
</div>
 
</div>
 
<!--MAIN HEADER -->
 
<!--MAIN HEADER -->
Line 21: Line 21:
  
 
| style="text-decoration: none;border-radius: 10px 10px 0 0;background-color: #F7E5DD;" width="15%" |     
 
| style="text-decoration: none;border-radius: 10px 10px 0 0;background-color: #F7E5DD;" width="15%" |     
[[ISS608_2017-18_T3_G2_Assign_ChenYanchong_Reference| <font color=#CD5C5C>Comments & References</font>]]
+
[[ISS608_2017-18_T3_G2_Assign_ChenYanchong_Reference| <font color=#CD5C5C>Feedback</font>]]
  
 
|}
 
|}
  
<font size="5">The final visualisation story board is available through  </font>[https://public.tableau.com/views/VA_Assignment_YanchongCHEN/StoryofwaterwaysinBLPreserve?:embed=y&:display_count=yes&publish=yes Chen Yanchong's Tableau Public]<br/>
+
<font size="5">The complete story board for this visualisation is available through  </font>[https://public.tableau.com/views/VA_Assignment_YanchongCHEN/StoryofwaterwaysinBLPreserve?:embed=y&:display_count=yes&publish=yes Chen Yanchong's Tableau Public]<br/>
<font size="5">The task of the visualisation is to answer the following 3 questions.</font><br/>
+
'''This page will give a brief introduction of how the story board was prepared'''
  
=Question 1=
+
=Preparation of worksheets=
'''Characterize the past and most recent situation with respect to chemical contamination in the Boonsong Lekagul waterways. Do you see any trends of possible interest in this investigation?'''<br/><br/>
+
There are total 11 worksheets are used.
If the overall tendency (all showing upward or all showing downward trend) is the same across the 10 locations, we can assume that the changes along with years are caused by natural effect instead of human intervention. <br/>
+
==1. map==
Slope chart is a good practice to show the difference between the start point and the end point.<br/>
+
Purpose of this worksheet: making use of the map and using the location information properly<br/>
[[Image:Cyc_viz_1.png|550px]] <br/>
+
[[Image:Cyc_wh_1.png|350px]] <br/>
From above slope chart, the water taken from the 10 locations seems cleaner in the year 2016 than it in the year 2010. Compared to the obvious decrease, Decha and Tansanee decrease slower than the rest of the locations.<br/>
+
Steps to take:<br/>
Does the water composition decrease year by year?  Taking a closer look at how the water composition changes year by year.<br/>
+
# Creating a excel worksheet named location with three columns inside it, location, Xcoord and Ycoord
[[Image:Cyc_viz_2.png|550px]] <br/>
+
# Adding a background map in Tableau
Above line chart reveals that in year 2013, the number of compositions decrease till the lowest point in 7 of the 10 locations, but not in Kohsoom, Boonsri and Tansanee. After year 2013, the trend reversed a bit, the composition of water chemicals in the other 8 places start to increase whilst Kohsoom and Boonsri show contrast trend.<br/>
+
# Using annotation to get the coordinate of each location on the map
Is it because these three places are polluted by human intervention? From the background information, the furniture company Kasios was established in year 2013,  furthermore, from the map of the preserve,  Kohsoom and Boonsri are the two locations which are the nearest ones to the suspicious waste dumping point, however, Tasanee is further away from the approximate waste dumping point.<br/>
+
# Using the prepared data table to left join the location table by the column "location"
'''Point source'''[https://www.watereducation.org/aquapedia-background/point-source-vs-nonpoint-source-pollution]might be helpful to clarify the mystery.<br/>
+
[[Image:Cyc_wh_prep_1.jpg|400px]]
[[Image:Cyc_viz_3.png|550px]]<br/>
 
The purpose of the point source is to compare the differences of one certain measures among the locations during the same period of time. To perform the point source, multiple locations and years are allowed to select at a time, however, only one measure can be picked out at the same time. <br/>
 
Above picture shows how the water temperature changes in the past 6 years among the 10 positions. Obviously, water temperature shows the same trends across the 10 locations, it shows a bell curve and always reaches its highest in July. There are some empty area in the graph, indicates that water temperature is not taken. <br/>
 
After examining all the measures in the processed data set, several conclusions are as attached : <br/>
 
<div style="margin:0px; padding: 2px; font-family: Arial; border-radius: 1px; text-align:left">
 
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
 
|-
 
|
 
'''Conclusions'''
 
||
 
'''Evidences'''
 
  
|-
+
==2. measure count overall==
|
+
Purpose of this worksheet: to should the tendency from the start to the end<br/>
'''Kannika and Sakda are quite clean area'''
+
[[Image:Cyc_wh_2.png|550px]]<br/>
||
+
Steps to take:<br/>
* Macrozoobenthos are the indicator of nice water quality, compared to the other locations, the detected number of macrozoobenthos is higher, even in year 2014, after year 2013.
+
# Create a calculated field ''' measure count ''', using formula : '''COUNTD([Measure])'''
 +
# Create a calculated field ''' first or last ''', using formula: '''FIRST() == 0 OR LAST() == 0'''
 +
# Drag '''sample date''' to the column shelf, and '''measure count ''' to row shelf
 +
# '''first or last ''' and '''location''' as filter, and choose '''True''' only
 +
# '''measure count''' on row shelf again, and choose '''dual axis'''
 +
# '''location''' on color card, '''measure count''' and '''location''' on label card
  
|-
+
==3. measure count detailed==
|
+
Purpose of this worksheet: showing more detailed measure count than the slope chart above<br/>
'''Somchair and Sakda are quite similar area'''
+
[[Image:Cyc_wh_3.png|550px]]<br/>
||
+
Steps to take:<br/>
* Evidence can be found while point sourcing the measures, especially Barium, Boron and Chlorodine, the distributions are more or less the same among these areas.
+
# Duplicate the slope chart
 +
# Remove '''first or last''' from filter
  
|-
+
==4. measure value per loc==
|
+
Purpose of this worksheet: using with the map to perform point source<br/>
'''The density of chemicals in downstream will not be affected by the upstream, except for cumulative chemicals'''
+
[[Image:Cyc_wh_4.png|550px]]<br/>
||
+
Steps to take:<br/>
* Using following groups as comparison: <br/>
+
# Drag '''sample date''' to column shelf and drill it down to '''month''' granularity
# Boonsri, Kohsoom, Busarakhan, Chai;
+
# Drag '''location''' and '''measure''' to row shelf in order
# Chai, Kannika;
+
# Put '''value''' on row shelf and choose '''average''' as aggregation method
# Somchair, Sakda;
+
# Add '''value''' to color card and choose '''average''' as aggregation method
* Though Achara and Sakda are of the same stream, they are not in the groups for comparison, because Achara does not behave the same as the rest, no matter which measures are using for comparison, one possible reason could be the human intervention.<br/>
+
# Drag '''location''' and '''measure''' to filter
  
|-
 
|
 
'''Boonsri, Kohsoom are similar'''
 
||
 
* Most of the toxics are the same distributed in these two areas along with time, moreover, Fecal coliforms, Fecal streptococci and Total coliforms, which are commonly known as the indicator of polluted areas, they only be detected in these two locations;<br/>
 
* Most of the toxics are the same distributed in these two areas along with time, moreover, Fecal coliforms, Fecal streptococci and Total coliforms, which are commonly known as the indicator of polluted areas, they only be detected in these two locations.
 
  
|-
+
==5. measure value trend by year==
|
+
Purpose of this worksheet: how single measure changes in each location by year<br/>
'''Water temperature will affect the other chemicals'''
+
[[Image:Cyc_wh_6.png|550px]]<br/>
||
+
Steps to take:<br/>
* As mentioned in above section, water temperature reaches its highest point in the preserve, meanwhile, total dissolved oxygen reaches its lowest in all the locations.
+
# '''sample date''' on column shelf and drill down to month
 +
# Put value on row shelf and choose average as aggregating method
 +
# Choose '''area''' for marks
 +
# Insert '''measure''' in title
  
|-
 
|
 
'''There could be another waste dumping point near Tansanee'''
 
||
 
* Though no direct pollutant indicators are found in the water sensor readings, a not small amount chemicals, especially toxics are found in Tansanee with similar distribution as they were in Boonsri and Kohsoom, such as total hardness, Bicarbonates, Sufides, Iron, Total organic carbon and so forth;<br/>
 
* Some toxics show especially high volume in Tansanee compared to other locations, for example, Total dissolved phosphorus and Total dissolved salts;<br/>
 
* Some chemicals never appear in Tansanee until year 2013.
 
  
|-
+
==6. measure value==
|}
+
Purpose of this worksheet: giving more intuitive feeling of the difference of single measure between the start year and the end year <br/>
</div>
+
[[Image:Cyc_wh_5.png|550px]]<br/>
=Question 2=
+
Steps to take:<br/>
'''What anomalies do you find in the waterway samples dataset? How do these affect your analysis of potential problems to the environment? Is the Hydrology Department collecting sufficient data to understand the comprehensive situation across the Preserve? What changes would you propose to make in the sampling approach to best understand the situation? '''<br/><br/>
+
# Duplicate the second slope chart
<div style="margin:0px; padding: 2px; font-family: Arial; border-radius: 1px; text-align:left">
+
# Add '''measure''' on filter
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
+
# Double click the title and insert ''' measure ''' on the title
|-
+
# in the tooltip card, format it a bit and insert above chart to drill down more details
|
+
 
'''Anomalies '''
+
 
||
+
==7. heat map==
'''Effect'''  
+
Purpose of this worksheet: using heatmap for the convenience of comparison and obtain more detail time series analysis<br/>
 +
[[Image:Cyc_wh_7.png|550px]]<br/>
 +
Steps to take:<br/>
 +
# Select '''measure''' to column shelf and '''location''' to row shelf
 +
# Drag '''sample date''' on column shelf and drill it down to '''month'''
 +
# Drag '''month''' to row shelf
 +
# Apply '''location''' and '''measure''' as filters
 +
# Put average '''value''' on color card
 +
 
  
|-
+
==8. seasonality trend==
|
+
Purpose of this worksheet: to discovery seasonality patterns<br/>
The measures are not consistent in different years. E.g. Auminium is not measured in other years except for year 2010
+
[[Image:Cyc_wh_8.png|550px]]<br/>
||
+
Steps to take:<br/>
the comparison against years could be inaccurate
+
# Duplicate '''the measure value by year trend''' worksheet
 +
# Reverse the position of '''year''' and '''month'''
  
|-
 
|
 
The chemicals measured by the water sensors are not the same across the locations. E.g. 17 measures were detected in one location while the other locations might only take measures with a number varies from 10 to 35 during the same time period
 
||
 
it is better if the sample data could detect same measures in different locations, hence, when one measure does not exist, could mark as 0. If that measure is not detected, could mark as null. <br/>
 
In current sample data, null values are confusing.
 
  
|-
+
==9. percentage diff==
|
+
Purpose of this worksheet: comparing each location against all<br/>
The number of records are not the same in different locations even for the same measure which the record is taken at the same day. E.g. chlorides was taken 3 times in Achara at one day, but in Boonsri it was only taken once
+
[[Image:Cyc_wh_9.png|550px]]<br/>
||
+
Steps to take:<br/>
sometimes the water temperature and even stream speed might draw a huge effect on the density of the measures, if one measure was taken twice a day, the other measures should be taken twice no matter which location it was taken from
+
# Create calculated field '''avg_across_all''' by formula : '''{ FIXED YEAR([Sample Date]), [Measure]:AVG([Value])}'''
 +
# Create calculated field '''pct_diff_with_all ''' by formula: ''' ({ FIXED YEAR([Sample Date]), [Location], [Measure]:AVG([Value])} - [avg_across_all])/[avg_across_all]'''
 +
# Create calculated field ''' increased? ''' by formula: ''' IF [pct_diff_with_all] > 0 THEN 'higher' ELSEIF [pct_diff_with_all] < 0 THEN 'lower' ELSE 'equal' END'''
 +
# '''location''' and '''pct_diff_with_all''' on column shelf
 +
# '''sample date''' on row shelf
 +
# '''measure''' and '''location''' on filter
 +
# '''increased? ''' on color card
  
|-
 
|
 
The sampling date is not consistency, for example, on 1 Jan 2015, Achara and other 8 locations have sensor readings but for Boonsri and Decha sensor readings are taken on 8 Jan 2015
 
||
 
such issue limits the analysis to drill down to a lower granularity (date level), moreover the accuracy could be affected
 
  
|-
+
==10 0.3 principle==
|}
+
Purpose of this worksheet: using 0.3 principle to detect the locations are polluted<br/>
</div>
+
'''''0.3 principle''' can see data preparation section for more details''<br/>
To conclude, the Hydrology Department offers sufficient volume of data for analysis. But there are some issues existing, which cause ambiguity. The data quality should be improved for better understanding the comprehensive situation across the preserve. <br/><br/>
+
[[Image:Cyc_wh_10.png|550px]]<br/>
Personal suggestions to improve the sampling strategies:<br/>
+
Steps to take:<br/>
* 1. Samples should be taken at the same day from the sampling positions
+
# Create calculated field '''total chemical oxygen demand''' by formula: ''' IF [Measure]='Chemical Oxygen Demand (Cr)' THEN [Value] ELSEIF [Measure]='Chemical Oxygen Demand (Mn)' THEN [Value] END'''
* 2. If one location measures water three times a day, the other locations should have 3 records for each chemicals per day
+
# Create calculated field '''total oxygen demand''' by formula ''' IF [Measure]='Biochemical Oxygen' THEN [Value] END'''
* 3. When arrange one sampling investigation, should take the same measures across the preserve, hence, if one chemical does not appear in the investigated locations, the value column should be zero, if one chemical is not measured during that investigation, it will not appear on the measure list across the locations
+
# Create calculated field '''oxygen rate''' by formula '''AVG([total oxygen demand])/AVG([total chemical oxygen demand])'''
 +
# Year of '''sample date''' and month of '''sample date''' on column shelf
 +
# '''oxygen rate''' on row shelf
 +
#  Year of '''sample date''' and '''location''' on filter and '''location''' on details as well as on color
 +
# Add a constant reference line on across table with a value '''0.3'''
  
  
 +
==11. intuitive line graph==
 +
Purpose of this worksheet: dividing the worksheet by location to obtain insights easier<br/>
 +
[[Image:Cyc_wh_11.png|550px]]<br/>
 +
Steps to take:<br/>
 +
# Create calculated field '''col divider ''' by formula : '''(INDEX()-1)%(ROUND(SQRT(SIZE()))) ''' 
 +
# Create calculated field '''row divider''' by formula : ''' INT((INDEX()-1)/ROUND(SQRT(SIZE())))'''
 +
# Put '''location''' on detail and label
 +
# Put '''col divider''' on column shelf and '''row divider''' on row shelf, make them computing using '''location''' and make them discrete value
 +
# Drag '''value''' on row shelf as well as on color card and on label card
 +
# Put '''measure''' as filter
  
=Question 3=
+
=User Guide for dashboard=
'''After reviewing the data, do any of your findings cause particular concern for the Pipit or other wildlife? Would you suggest any changes in the sampling strategy to better understand the waterways situation in the Preserve? '''<br/><br/>
+
==Dashboard 1: Water composition changes by year==
According to Wikipedia, Pipits are associated with some kind of grassland, from sea-level to alpine tundra. The diet of the pipits is dominated by small invertebrates. Insects are the most important prey items, the composition of their diet apparently reflecting the abundance of their prey in the location (and varying with the season).[https://en.wikipedia.org/wiki/Pipit]<br/>
+
[[Image:Cyc_db_1.png|550px]]
Atrazine and p,p-DDT were founded in the water, they are commonly used herbicide. They could penetrate from soil to water through groundwater circulation. Surprisingly, they were found in the waterways sensor readings, there is a high probability that they were being used to weed, which may affect the grass within the preserve, hence, the habitat of the pipits was destroyed. Moreover, pipit's diet is mainly from intersects, whilst grass is the major habitat for breeding intersects. As a result, the number of pipits will be affected in the corresponding years when herbicide was found. If the records regarding pipit's number could be provided, this assumption could be verified.<br/><br/>
+
==Dashboard 2: Point source among locations==
Improvements for sampling strategy should made to better understanding the waterways situation:<br/>
+
To filter measures, user can choose from the dropdown menu;<br/>
Sample data are taken from 4 different streams in the preserve, shown as following, among all of the 10 locations, only 2 of them are located on the tributary, namely, Kohsoom and Busarakhan. Though these two locations are near the approximate waste dumping point, they have good reasons to take more samples from the suspicious area, however, comparing water from tributary with water from the main stream is unfair as water in rivers are sensitive to locations in a sequential order, the upstream will affect the downstream, the tributary will affect the main stream, due to the liquidity of water.<br/>
+
Click the locations on the map the location filter can work on the scatter plot as well;<br/>
To better understanding the waterways situation within the preserve, more sampling points are needed, those points should locate at the tributary of the rests of the river streams, hence, comparisons across rivers could be performed within tributary as well as within main stream.
+
If user clicked the empty area on the map, all the locations will be selected.<br/>
 +
[[Image:Cyc_db_2.png|550px]]

Latest revision as of 10:46, 9 July 2018

Cyc nav pic.jpg
ISSS608 Visual Analytics Vast Challenge Mini Challenge 2

Overview

Data Preparation

Visualization

Answer

Feedback

The complete story board for this visualisation is available through Chen Yanchong's Tableau Public
This page will give a brief introduction of how the story board was prepared

Preparation of worksheets

There are total 11 worksheets are used.

1. map

Purpose of this worksheet: making use of the map and using the location information properly
Cyc wh 1.png
Steps to take:

  1. Creating a excel worksheet named location with three columns inside it, location, Xcoord and Ycoord
  2. Adding a background map in Tableau
  3. Using annotation to get the coordinate of each location on the map
  4. Using the prepared data table to left join the location table by the column "location"

Cyc wh prep 1.jpg

2. measure count overall

Purpose of this worksheet: to should the tendency from the start to the end
Cyc wh 2.png
Steps to take:

  1. Create a calculated field measure count , using formula : COUNTD([Measure])
  2. Create a calculated field first or last , using formula: FIRST() == 0 OR LAST() == 0
  3. Drag sample date to the column shelf, and measure count to row shelf
  4. first or last and location as filter, and choose True only
  5. measure count on row shelf again, and choose dual axis
  6. location on color card, measure count and location on label card

3. measure count detailed

Purpose of this worksheet: showing more detailed measure count than the slope chart above
Cyc wh 3.png
Steps to take:

  1. Duplicate the slope chart
  2. Remove first or last from filter

4. measure value per loc

Purpose of this worksheet: using with the map to perform point source
Cyc wh 4.png
Steps to take:

  1. Drag sample date to column shelf and drill it down to month granularity
  2. Drag location and measure to row shelf in order
  3. Put value on row shelf and choose average as aggregation method
  4. Add value to color card and choose average as aggregation method
  5. Drag location and measure to filter


5. measure value trend by year

Purpose of this worksheet: how single measure changes in each location by year
Cyc wh 6.png
Steps to take:

  1. sample date on column shelf and drill down to month
  2. Put value on row shelf and choose average as aggregating method
  3. Choose area for marks
  4. Insert measure in title


6. measure value

Purpose of this worksheet: giving more intuitive feeling of the difference of single measure between the start year and the end year
Cyc wh 5.png
Steps to take:

  1. Duplicate the second slope chart
  2. Add measure on filter
  3. Double click the title and insert measure on the title
  4. in the tooltip card, format it a bit and insert above chart to drill down more details


7. heat map

Purpose of this worksheet: using heatmap for the convenience of comparison and obtain more detail time series analysis
Cyc wh 7.png
Steps to take:

  1. Select measure to column shelf and location to row shelf
  2. Drag sample date on column shelf and drill it down to month
  3. Drag month to row shelf
  4. Apply location and measure as filters
  5. Put average value on color card


8. seasonality trend

Purpose of this worksheet: to discovery seasonality patterns
Cyc wh 8.png
Steps to take:

  1. Duplicate the measure value by year trend worksheet
  2. Reverse the position of year and month


9. percentage diff

Purpose of this worksheet: comparing each location against all
Cyc wh 9.png
Steps to take:

  1. Create calculated field avg_across_all by formula : { FIXED YEAR([Sample Date]), [Measure]:AVG([Value])}
  2. Create calculated field pct_diff_with_all by formula: ({ FIXED YEAR([Sample Date]), [Location], [Measure]:AVG([Value])} - [avg_across_all])/[avg_across_all]
  3. Create calculated field increased? by formula: IF [pct_diff_with_all] > 0 THEN 'higher' ELSEIF [pct_diff_with_all] < 0 THEN 'lower' ELSE 'equal' END
  4. location and pct_diff_with_all on column shelf
  5. sample date on row shelf
  6. measure and location on filter
  7. increased? on color card


10 0.3 principle

Purpose of this worksheet: using 0.3 principle to detect the locations are polluted
0.3 principle can see data preparation section for more details
Cyc wh 10.png
Steps to take:

  1. Create calculated field total chemical oxygen demand by formula: IF [Measure]='Chemical Oxygen Demand (Cr)' THEN [Value] ELSEIF [Measure]='Chemical Oxygen Demand (Mn)' THEN [Value] END
  2. Create calculated field total oxygen demand by formula IF [Measure]='Biochemical Oxygen' THEN [Value] END
  3. Create calculated field oxygen rate by formula AVG([total oxygen demand])/AVG([total chemical oxygen demand])
  4. Year of sample date and month of sample date on column shelf
  5. oxygen rate on row shelf
  6. Year of sample date and location on filter and location on details as well as on color
  7. Add a constant reference line on across table with a value 0.3


11. intuitive line graph

Purpose of this worksheet: dividing the worksheet by location to obtain insights easier
Cyc wh 11.png
Steps to take:

  1. Create calculated field col divider by formula : (INDEX()-1)%(ROUND(SQRT(SIZE())))
  2. Create calculated field row divider by formula : INT((INDEX()-1)/ROUND(SQRT(SIZE())))
  3. Put location on detail and label
  4. Put col divider on column shelf and row divider on row shelf, make them computing using location and make them discrete value
  5. Drag value on row shelf as well as on color card and on label card
  6. Put measure as filter

User Guide for dashboard

Dashboard 1: Water composition changes by year

Cyc db 1.png

Dashboard 2: Point source among locations

To filter measures, user can choose from the dropdown menu;
Click the locations on the map the location filter can work on the scatter plot as well;
If user clicked the empty area on the map, all the locations will be selected.
Cyc db 2.png