Difference between revisions of "ISSS608 2017-18 T3 Assign Yang Zhengyan Data Preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 28: Line 28:
 
|}
 
|}
 
<br/>
 
<br/>
==Data description==
+
==Data Description==
  
 
===Data field===
 
===Data field===

Revision as of 18:12, 8 July 2018

Yangzhengyan2.jpg    Mini-Challenge 2 : Like a Duck to Water

Background

Data Preparation

Methodology

Insights

Feedback

 


Data Description

Data field

  • Id: Identification number for the record (only for bookkeeping)
  • Value: Measured value for the chemical or property in this record
  • Location: Name of the location sample was taken from. See the map for geo-location of the sampling site.
  • Sample Date: Date sample was taken from the location
  • Measure: Chemicals (e.g., Sodium) or water properties (e.g., Water temperature) measured in the record

Sample Data:
id,value,location,sample date,measure
2221,2,Boonsri,11-Jan-98,Water temperature
2223,9.1,Boonsri,11-Jan-98,Dissolved oxygen
2227,0.33,Boonsri,11-Jan-98,Ammonium
2228,0.01,Boonsri,11-Jan-98,Nitrites
2229,1.47,Boonsri,11-Jan-98,Nitrates
2230,0.06,Boonsri,11-Jan-98,Orthophosphate-phosphorus
2231,0.09,Boonsri,11-Jan-98,Total phosphorus
2232,13.9,Boonsri,11-Jan-98,Sodium

Data tools

  • SAS JMP Pro
  • Tableau

Variable distribution

There are 10 locations, 106 measures in the dataset. So it is very difficult to apply all measures into the analysis. And we need to characterize the past and most recent situation with respect to chemical contamination in the Boonsong Lekagul waterways. So I filter out the reading values with recent years first (2011-2016) and exclude the remaining data.

YANGZHENGYANPic1.jpg

The following screens showing the variables remained at dataset.

YANGZHENGYANPic2.jpg

Variable clustering

Through initial data exploration, there are some types of measures with similar trend and values, e.g Orthophosphate-phosphorus, Total dissolved phosphorus, Total phosphorus. So we need to exclude those similar measures and remain one typical one for further investigation.

YANGZHENGYANPic3.jpg