Difference between revisions of "ChenNannan-Data preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
(Created page with "<div style=background:#2B3856 border:#A3BFB1> 250px <font size = 5; color="#FFFFFF"> ISSS608 Assign ChenNannan-MC2</font> </div> <!--MAIN HEADER --> {|...")
 
 
Line 25: Line 25:
 
<br/>
 
<br/>
  
== Data Quality Issues ==
+
<div style="text-align:center; padding-top:25px;">
*No missing value.
+
<font size = 5>Data Quality Issues</font>
[[Image:Cdn1.png|500px]]  
+
</div>
*At least 2.5% of 0 value in value variables which is meaningless.
+
<div style="text-align:center; padding-top:25px;">
[[Image:Cdn2.png|150px]]  
+
<font size = 3>No missing value.</font>
*Year 1998 and 1999 are imported wrong. The time series range is from 1998 to 2016.
+
</div>
[[Image:Cdn3.png|150px]]  
+
[[Image:Cdn1.png|500px|center]]  
*Same location, sample date and measure have different value record.
+
<div style="text-align:center; padding-top:25px;">
[[Image:Cdn4.png|350px]]  
+
<font size = 3>At least 2.5% of 0 value in value variables which is meaningless.</font>
== Data Preparation ==
+
</div>
*Recode the sample date.
+
[[Image:Cdn2.png|150px|center]]  
[[Image:Cdn5.png|350px]]  
+
<div style="text-align:center; padding-top:25px;">
*Use the summary function to avoid duplication record by mean.
+
<font size = 3>Year 1998 and 1999 are imported wrong. The time series range is from 1998 to 2016.</font>
[[Image:Cdn6.png|350px]]
+
</div>
*'Dcast' the data
+
[[Image:Cdn3.png|150px|center]]  
[[Image:Cdn7.png|200px]]
+
<div style="text-align:center; padding-top:25px;">
*Standardize the value by each kinds of measure because different units.
+
<font size = 3>Same location, sample date and measure have different value record.</font>
[[Image:Cdn8.png|288px]]
+
</div>
*'Melt' the data
+
[[Image:Cdn4.png|500px|center]]  
[[Image:Cdn9.png|200px]]
+
<div style="text-align:center; padding-top:25px;">
 +
<font size = 5>Data Preparation</font>
 +
</div>
 +
<div style="text-align:center; padding-top:25px;">
 +
<font size = 3>Recode the sample date.</font>
 +
</div>
 +
[[Image:Cdn5.png|500px|center]]  
 +
<div style="text-align:center; padding-top:25px;">
 +
<font size = 3>Use the summary function to avoid duplication record by mean.</font>
 +
</div>
 +
[[Image:Cdn6.png|500px|center]]  
 +
<div style="text-align:center; padding-top:25px;">
 +
<font size = 3>'Dcast' the data</font>
 +
</div>
 +
[[Image:Cdn7.png|200px|center]]  
 +
<div style="text-align:center; padding-top:25px;">
 +
<font size = 3>Standardize the value by each kinds of measure because different units.</font>
 +
</div>
 +
[[Image:Cdn8.png|300px|center]]  
 +
<div style="text-align:center; padding-top:25px;">
 +
<font size = 3>'Melt' the data</font>
 +
[[Image:Cdn9.png|200px|center]]  
 +
</div>

Latest revision as of 21:13, 8 July 2018

Binrndc.jpg ISSS608 Assign ChenNannan-MC2

Introduction

Data preparation

Insights

Conclusion

 


Data Quality Issues

No missing value.

Cdn1.png

At least 2.5% of 0 value in value variables which is meaningless.

Cdn2.png

Year 1998 and 1999 are imported wrong. The time series range is from 1998 to 2016.

Cdn3.png

Same location, sample date and measure have different value record.

Cdn4.png

Data Preparation

Recode the sample date.

Cdn5.png

Use the summary function to avoid duplication record by mean.

Cdn6.png

'Dcast' the data

Cdn7.png

Standardize the value by each kinds of measure because different units.

Cdn8.png

'Melt' the data

Cdn9.png