Difference between revisions of "Kabak: Research Paper Data Preparation"
Jump to navigation
Jump to search
(11 intermediate revisions by the same user not shown) | |||
Line 10: | Line 10: | ||
| style="vertical-align:top;width:16%;" | <div style="padding: 3px; font-weight: bold; text-align:center; line-height: wrap_content; font-size:16px; border-bottom:1px solid #000000; border-top:1px solid #000000; font-family:Trebuchet MS"> [[Kabak: Application | <font color="#35383c"><b>APPLICATION</b>]] | | style="vertical-align:top;width:16%;" | <div style="padding: 3px; font-weight: bold; text-align:center; line-height: wrap_content; font-size:16px; border-bottom:1px solid #000000; border-top:1px solid #000000; font-family:Trebuchet MS"> [[Kabak: Application | <font color="#35383c"><b>APPLICATION</b>]] | ||
− | | style="vertical-align:top;width:16%;" | <div style="padding: 3px; font-weight: bold; text-align:center; line-height: wrap_content; font-size:16px; border-bottom:1px solid #000000; border-top:1px solid #000000; font-family:Trebuchet MS; background-color:#35383c;"> [[Kabak: | + | | style="vertical-align:top;width:16%;" | <div style="padding: 3px; font-weight: bold; text-align:center; line-height: wrap_content; font-size:16px; border-bottom:1px solid #000000; border-top:1px solid #000000; font-family:Trebuchet MS; background-color:#35383c;"> [[Kabak: Report | <font color="#FFFFFF"><b>REPORT</b>]] |
+ | |||
+ | | style="vertical-align:top;width:16%;" | <div style="padding: 3px; font-weight: bold; text-align:center; line-height: wrap_content; font-size:16px; border-bottom:1px solid #000000; border-top:1px solid #000000; font-family:Trebuchet MS"> [[Project_Groups | <font color="#35383c"><b>OTHER PROJECT GROUPS</b>]] | ||
|} | |} | ||
Line 18: | Line 20: | ||
{| style="background-color:#ffffff ; margin: 3px 11px 3px 11px;" width="80%"| | {| style="background-color:#ffffff ; margin: 3px 11px 3px 11px;" width="80%"| | ||
| style="font-family:Trebuchet MS; font-size:11px; text-align: center; border:solid 1px #35383c; background-color: #FFFFFF" width="200px" | | | style="font-family:Trebuchet MS; font-size:11px; text-align: center; border:solid 1px #35383c; background-color: #FFFFFF" width="200px" | | ||
− | [[Kabak: | + | [[Kabak: Report|<font color="#35383c"><strong>OVERVIEW</strong></font>]] |
| style="font-family:Trebuchet MS; font-size:11px; text-align: center; border:solid 1px #35383c; background-color: #35383c" width="200px" | | | style="font-family:Trebuchet MS; font-size:11px; text-align: center; border:solid 1px #35383c; background-color: #35383c" width="200px" | | ||
− | [[Kabak: | + | [[Kabak: Report Data Preparation|<font color="#FFFFFF"><strong>DATA PREPARATION</strong></font>]] |
| style="font-family:Trebuchet MS; font-size:11px; text-align: center; border:solid 1px #35383c; background-color: #FFFFFF" width="200px" | | | style="font-family:Trebuchet MS; font-size:11px; text-align: center; border:solid 1px #35383c; background-color: #FFFFFF" width="200px" | | ||
− | [[Kabak: | + | [[Kabak: Report Analysis|<font color="#35383c"><strong>ANALYSIS</strong></font>]] |
− | |||
− | |||
− | |||
|} | |} | ||
</center> | </center> | ||
Line 80: | Line 79: | ||
*T8 Ethnicity | *T8 Ethnicity | ||
**378 rows of raw data | **378 rows of raw data | ||
+ | |} | ||
+ | <br/> | ||
+ | =Data Cleaning= | ||
+ | {| class="wikitable" style="background-color:#FFFFFF;" width="100%" | ||
+ | |- | ||
+ | ! style="font-weight: bold;background: #35383c;color:#FFFFFF;width: 50%;font-family:Trebuchet MS" | METHOD | ||
+ | ! style="font-weight: bold;background: #35383c;color:#FFFFFF;font-family:Trebuchet MS" | DESCRIPTION | ||
+ | |- | ||
+ | | | ||
+ | * Stack data to consolidate data table in to 2 columns (Postal Code, Housing Type) | ||
+ | * Remove rows with missing data | ||
+ | || | ||
+ | [[File: Kabakdatacleaning1.png|400px|center]] | ||
+ | |- | ||
+ | | | ||
+ | * Concatenate all 12 months data into one consolidated data table | ||
+ | **By the end of this phase of data cleaning, we have a total of 177,053 rows | ||
+ | || | ||
+ | [[File: Kabakdatacleaning2.png|400px|center]]|- | ||
+ | |- | ||
+ | | | ||
+ | * Merging Private Housing Data with Public Housing Data | ||
+ | **Final consolidated data consist of 241,766 rows | ||
+ | || | ||
+ | [[File: Kabakdatacleaning3.png|400px|center]] | ||
+ | |- | ||
+ | | | ||
+ | * Geocoding of postal codes with missing latitudes and longitude via https://developers.google.com/maps/documentation/geocoding/intro | ||
+ | **Public housing data: 223 missing data | ||
+ | **Private housing data: 338 missing data | ||
+ | || | ||
+ | [[File: GEOCODING.PNG|400px|center]] | ||
|} | |} | ||
<br/> | <br/> |
Latest revision as of 15:46, 22 November 2016
Initial Dataset
DATASET | DESCRIPTION | DATA USED |
---|---|---|
Average Monthly Household Electricity Consumption Link (1H): https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/23RSU.xls Link (2H): https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/25RSU.xls |
|
|
Average Monthly Household Electricity Consumption by Postal Code (Private Apartments), 2015 Link: https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/2RSU.xls |
|
|
Basic Demographics Characteristics (2015) |
|
|
Data Cleaning
METHOD | DESCRIPTION |
---|---|
|
|
|
|- |
|
|
|