Difference between revisions of "ChenNannan-Data preparation"
Jump to navigation
Jump to search
Nnchen.2017 (talk | contribs) (Created page with "<div style=background:#2B3856 border:#A3BFB1> 250px <font size = 5; color="#FFFFFF"> ISSS608 Assign ChenNannan-MC2</font> </div> <!--MAIN HEADER --> {|...") |
Nnchen.2017 (talk | contribs) |
||
| Line 25: | Line 25: | ||
<br/> | <br/> | ||
| − | == Data Quality Issues == | + | <div style="text-align:center; padding-top:25px;"> |
| − | + | <font size = 5>Data Quality Issues</font> | |
| − | [[Image:Cdn1.png|500px]] | + | </div> |
| − | + | <div style="text-align:center; padding-top:25px;"> | |
| − | [[Image:Cdn2.png|150px]] | + | <font size = 3>No missing value.</font> |
| − | + | </div> | |
| − | [[Image:Cdn3.png|150px]] | + | [[Image:Cdn1.png|500px|center]] |
| − | + | <div style="text-align:center; padding-top:25px;"> | |
| − | [[Image:Cdn4.png| | + | <font size = 3>At least 2.5% of 0 value in value variables which is meaningless.</font> |
| − | == Data Preparation == | + | </div> |
| − | + | [[Image:Cdn2.png|150px|center]] | |
| − | [[Image:Cdn5.png| | + | <div style="text-align:center; padding-top:25px;"> |
| − | + | <font size = 3>Year 1998 and 1999 are imported wrong. The time series range is from 1998 to 2016.</font> | |
| − | [[Image:Cdn6.png| | + | </div> |
| − | + | [[Image:Cdn3.png|150px|center]] | |
| − | [[Image:Cdn7.png|200px]] | + | <div style="text-align:center; padding-top:25px;"> |
| − | + | <font size = 3>Same location, sample date and measure have different value record.</font> | |
| − | [[Image:Cdn8.png| | + | </div> |
| − | + | [[Image:Cdn4.png|500px|center]] | |
| − | [[Image:Cdn9.png|200px]] | + | <div style="text-align:center; padding-top:25px;"> |
| + | <font size = 5>Data Preparation</font> | ||
| + | </div> | ||
| + | <div style="text-align:center; padding-top:25px;"> | ||
| + | <font size = 3>Recode the sample date.</font> | ||
| + | </div> | ||
| + | [[Image:Cdn5.png|500px|center]] | ||
| + | <div style="text-align:center; padding-top:25px;"> | ||
| + | <font size = 3>Use the summary function to avoid duplication record by mean.</font> | ||
| + | </div> | ||
| + | [[Image:Cdn6.png|500px|center]] | ||
| + | <div style="text-align:center; padding-top:25px;"> | ||
| + | <font size = 3>'Dcast' the data</font> | ||
| + | </div> | ||
| + | [[Image:Cdn7.png|200px|center]] | ||
| + | <div style="text-align:center; padding-top:25px;"> | ||
| + | <font size = 3>Standardize the value by each kinds of measure because different units.</font> | ||
| + | </div> | ||
| + | [[Image:Cdn8.png|300px|center]] | ||
| + | <div style="text-align:center; padding-top:25px;"> | ||
| + | <font size = 3>'Melt' the data</font> | ||
| + | [[Image:Cdn9.png|200px|center]] | ||
| + | </div> | ||
Latest revision as of 21:13, 8 July 2018
|
|
|
|
|
Data Quality Issues
No missing value.
At least 2.5% of 0 value in value variables which is meaningless.
Year 1998 and 1999 are imported wrong. The time series range is from 1998 to 2016.
Same location, sample date and measure have different value record.
Data Preparation
Recode the sample date.
Use the summary function to avoid duplication record by mean.
'Dcast' the data
Standardize the value by each kinds of measure because different units.