Difference between revisions of "Preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 56: Line 56:
 
Merge “Boonsong Lekagul waterways readings.txt” and “Location” into one file. I have already changed the name from “Boonsong Lekagul waterways readings” to “data”. Using left outer join to match properly.
 
Merge “Boonsong Lekagul waterways readings.txt” and “Location” into one file. I have already changed the name from “Boonsong Lekagul waterways readings” to “data”. Using left outer join to match properly.
  
[[File:an13.png|500px]]
+
[[File:2anAn13.png|500px]]
  
 
=Calendar chart=
 
=Calendar chart=

Revision as of 01:12, 9 July 2018

width="100%"

Mini-Challenge 2 Overview: Like a Duck to Water

Background Preparation Visualization Observations & Insights Feedback


Explore data

Import data “Boonsong Lekagul waterways readings.csv”. Explore data by JMP. We found this dataset displays multiple different values among the same sample data, measure, and locations. Shown as below picture. However, we could use average value to compute when using Tableau. Therefore, decide to leave it at first.

An2.png

Using Analysis> Distribution function and drag Sample date to Y, columns. We get the data distribution which is from 01/11/1998 to 12/31/2016, total 19 years.

2anAn3.png

2anan4.png

Explore the Location picture

Waterways Final.jpg is the picture that we would use. This picture has 10 places’ names. We look at Boonsong Lekagul waterways readings dataset and found column location matches locations name in picture. There are water quality sensors placed.

Next, we look carefully at this picture. Small rivers merging into a bigger river is the normal. Therefore, we could also tell rivers should flow from north to south. We could divide the 10 locations into 4 parts due to direction of streamflow. Split 10 locations into 4 parts would help us to analyze the following question 3 easily. We save this file name as “Waterways Final_changed.”

2anAn5.png

Using Tableau and Excel to add X, Y coordinate

Above dataset, Boonsong Lekagul waterways readings, contains most of the information that we could analyze. Next, we decide to use the second dataset, Location.csv. It contains X coordinate column and Y coordinate column. However, these two columns are blank. We need to add these two columns by ourselves. Notice the LR and UL are 249. Later when we insert the picture into tableau, we need to set the picture’s format.

2anan6.png

Using Tableau to open Location file and drag X coord and Ycoord into columns and Rows desperately. Drag Location into Details and also drag location into Filters. Click all.

2anan7.png

To add picture into this sheet. Click on Map, Background Image, and then choose the Sheet 1. Click on Add Image.

2anan8.png 2anan9.png
Insert the Waterways Final_changed. Jpg
X Field on Right: type 240
Y Field on Top: type 249

2anAn10.png

When we pick one dot nearby each location, it would should up the X and Y coordinate details. We re-write it into Location.txt and repeat the process until 10 places have already picked up.

2anAn11.png 2anAn12.png

Merge two files by tableau

Merge “Boonsong Lekagul waterways readings.txt” and “Location” into one file. I have already changed the name from “Boonsong Lekagul waterways readings” to “data”. Using left outer join to match properly.

2anAn13.png

Calendar chart

Drag sample date into columns and Measure into Rows. We set the sample date as Year level and change it into discrete. Darg Number of records into Color and Labels, and then change the color to red. Change automatic to Square to let color fill in and then use “All” at mark label.

500px 500px 500px500px

Using calender chart we could tell that not every measures have recourds every years and every years’ sampling records are not completed. Therefore, add Measure, Sample date into filter that we could pick the year that we are concerned. Histagram would introduce we use conscade Year (Sample Date) filter to help us link sheets together.

Histogram

For histogram, I create one calculation called “Count_appear_year” which can help me count how many years that measures appear. Drag Measure into Columns and Count_appear_year and number of records into Rows. Drag Count_appear_year into Label that the upper histogram would display the amount of aggregate counted year. Click on “All” at marks label. Next, we sort out count_of_year by ascending.

500px 500px 500px500px

Owing to Number of Records on Rows, also drag Number of Records into mark label. Click on “Highlighted” at mark label that when I have already selected the amount of count_appear_year I would like, the number of Records would also display. Using the second histogram, we could see that every measure’s number of records are different. It tells us the sensors are not consistent to detect. It is random to pick up water.

500px500px

Also, add Sample_date on Year level, count_appear_year, and Measure into filter. Here is the tricky that I click on right triangle of Year (Sample Date) filter, click on apply for Worksheets, and choose Selected Worksheet. I have already completed my whole tableau file, so the list would display whole sheets. The list I would tick all exclude Map because map do not need time on filter. It would help me when years I have selected on histogram, it would reflect to all sheets which contain Year (Sample Date) filter.

500px500px500px

Line chart

Create Anomaly calculation to help us point out some outliers which anomalies should be. Use three times of Standard Deviation to identify the outliers.

500px

Drag Sample date into Columns and change it to continuous, drag Measure, Value, Value into Rows. We have already known some of the values of data appear one more time at the same date, location and measure. Therefore, we use average value to correct this situation. Here we also got tricks, right click on one of the AVG(Value) on Dual Axis. The rows would combine together, and we untick one of each headers. Also, drag Anomaly into color of the second layer of AVG(Value)and assigned two colors to identify outliers. Next, change to Circle.

500px

Create measure_filter calculation. Darg Measure, Sample Date, and Measure_filter into Filters. We have already chosen measures on the previous histogram and click on show filter on measure_filter. Click on the right triangle to have expand option, not only click on Selected Worksheet that we mentioned before but click on Only Relevant Values. It could help us to filter out the measures we are only concerned.

500px500px

Add reference line: average value and expected band. The expected band we defined no more or less than 3 times of Standard Deviation.

500px500px