Difference between revisions of "Kabak: Report Data Preparation"
Jump to navigation
Jump to search
Line 88: | Line 88: | ||
|- | |- | ||
| | | | ||
− | * Stack data to consolidate data table in to 2 columns (Postal Code, Housing Type) | + | * Data cleaning: Household electricity consumption data |
− | * Remove rows with missing data | + | **Stack data to consolidate data table in to 2 columns (Postal Code, Housing Type) |
+ | **Remove rows with missing data | ||
|| | || | ||
[[File: Kabakdatacleaning1.png|400px|center]] | [[File: Kabakdatacleaning1.png|400px|center]] | ||
|- | |- | ||
| | | | ||
− | * Concatenate all 12 months data into one consolidated data table | + | * Data cleaning: Household electricity consumption data |
+ | **Concatenate all 12 months data into one consolidated data table | ||
**By the end of this phase of data cleaning, we have a total of 177,053 rows | **By the end of this phase of data cleaning, we have a total of 177,053 rows | ||
|| | || | ||
Line 100: | Line 102: | ||
|- | |- | ||
| | | | ||
− | * Merging Private Housing Data with Public Housing Data | + | * Data cleaning: Household electricity consumption data |
+ | **Merging Private Housing Data with Public Housing Data | ||
**Final consolidated data consist of 241,766 rows | **Final consolidated data consist of 241,766 rows | ||
|| | || | ||
Line 106: | Line 109: | ||
|- | |- | ||
| | | | ||
− | * Geocoding | + | * Geocoding: Postal codes with missing latitudes and longitude via https://developers.google.com/maps/documentation/geocoding/intro |
**Public housing data: 223 missing data | **Public housing data: 223 missing data | ||
**Private housing data: 338 missing data | **Private housing data: 338 missing data | ||
Line 113: | Line 116: | ||
|- | |- | ||
| | | | ||
− | * Data cleaning Age, Gender, Ethnicity | + | * Data cleaning: Age, Gender, Ethnicity |
**Delete rows that are empty & blank so at to merge the tables into one data sheet | **Delete rows that are empty & blank so at to merge the tables into one data sheet | ||
|| | || | ||
[[File: Kabakdatacleaning4.png|400px|center]] | [[File: Kabakdatacleaning4.png|400px|center]] | ||
+ | |- | ||
+ | | | ||
+ | * Data cleaning: Age, Gender, Ethnicity | ||
+ | ** Consolidate data through Stacking | ||
+ | || | ||
+ | [[File: Kabakdatacleaning5.png|400px|center]] | ||
|} | |} | ||
<br/> | <br/> |
Latest revision as of 16:58, 22 November 2016
Initial Dataset
DATASET | DESCRIPTION | DATA USED |
---|---|---|
Average Monthly Household Electricity Consumption Link (1H): https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/23RSU.xls Link (2H): https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/25RSU.xls |
|
|
Average Monthly Household Electricity Consumption by Postal Code (Private Apartments), 2015 Link: https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/2RSU.xls |
|
|
Basic Demographics Characteristics (2015) |
|
|
Data Cleaning
METHOD | DESCRIPTION |
---|---|
|
|
|
|
|
|
|
|
|
|
|