Difference between revisions of "Kabak: Research Paper Data Preparation"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
 
(11 intermediate revisions by the same user not shown)
Line 10: Line 10:
 
| style="vertical-align:top;width:16%;" | <div style="padding: 3px; font-weight: bold; text-align:center; line-height: wrap_content; font-size:16px; border-bottom:1px solid #000000; border-top:1px solid #000000; font-family:Trebuchet MS"> [[Kabak: Application | <font color="#35383c"><b>APPLICATION</b>]]
 
| style="vertical-align:top;width:16%;" | <div style="padding: 3px; font-weight: bold; text-align:center; line-height: wrap_content; font-size:16px; border-bottom:1px solid #000000; border-top:1px solid #000000; font-family:Trebuchet MS"> [[Kabak: Application | <font color="#35383c"><b>APPLICATION</b>]]
  
| style="vertical-align:top;width:16%;" | <div style="padding: 3px; font-weight: bold; text-align:center; line-height: wrap_content; font-size:16px; border-bottom:1px solid #000000; border-top:1px solid #000000; font-family:Trebuchet MS; background-color:#35383c;"> [[Kabak: Research Paper | <font color="#FFFFFF"><b>RESEARCH PAPER</b>]]
+
| style="vertical-align:top;width:16%;" | <div style="padding: 3px; font-weight: bold; text-align:center; line-height: wrap_content; font-size:16px; border-bottom:1px solid #000000; border-top:1px solid #000000; font-family:Trebuchet MS; background-color:#35383c;"> [[Kabak: Report | <font color="#FFFFFF"><b>REPORT</b>]]
 +
 
 +
| style="vertical-align:top;width:16%;" | <div style="padding: 3px; font-weight: bold; text-align:center; line-height: wrap_content; font-size:16px; border-bottom:1px solid #000000; border-top:1px solid #000000; font-family:Trebuchet MS"> [[Project_Groups | <font color="#35383c"><b>OTHER PROJECT GROUPS</b>]]
 
|}
 
|}
  
Line 18: Line 20:
 
{| style="background-color:#ffffff ; margin: 3px 11px 3px 11px;" width="80%"|
 
{| style="background-color:#ffffff ; margin: 3px 11px 3px 11px;" width="80%"|
 
| style="font-family:Trebuchet MS; font-size:11px; text-align: center; border:solid 1px #35383c; background-color: #FFFFFF" width="200px" |  
 
| style="font-family:Trebuchet MS; font-size:11px; text-align: center; border:solid 1px #35383c; background-color: #FFFFFF" width="200px" |  
[[Kabak: Research Paper Overview|<font color="#35383c"><strong>OVERVIEW</strong></font>]]
+
[[Kabak: Report|<font color="#35383c"><strong>OVERVIEW</strong></font>]]
  
 
| style="font-family:Trebuchet MS; font-size:11px; text-align: center; border:solid 1px #35383c; background-color: #35383c" width="200px" |   
 
| style="font-family:Trebuchet MS; font-size:11px; text-align: center; border:solid 1px #35383c; background-color: #35383c" width="200px" |   
[[Kabak: Research Paper Data Preparation|<font color="#FFFFFF"><strong>DATA PREPARATION</strong></font>]]
+
[[Kabak: Report Data Preparation|<font color="#FFFFFF"><strong>DATA PREPARATION</strong></font>]]
  
 
| style="font-family:Trebuchet MS; font-size:11px; text-align: center; border:solid 1px #35383c; background-color: #FFFFFF" width="200px" |   
 
| style="font-family:Trebuchet MS; font-size:11px; text-align: center; border:solid 1px #35383c; background-color: #FFFFFF" width="200px" |   
[[Kabak: Research Paper Analysis|<font color="#35383c"><strong>ANALYSIS</strong></font>]]
+
[[Kabak: Report Analysis|<font color="#35383c"><strong>ANALYSIS</strong></font>]]
 
 
| style="font-family:Trebuchet MS; font-size:11px; text-align: center; border:solid 1px #35383c; background-color: #FFFFFF" width="200px" | 
 
[[Kabak: Research Paper Project Management|<font color="#35383c"><strong>PROJECT MANAGEMENT</strong></font>]]
 
 
|}
 
|}
 
</center>
 
</center>
Line 80: Line 79:
 
*T8 Ethnicity  
 
*T8 Ethnicity  
 
**378 rows of raw data
 
**378 rows of raw data
 +
|}
 +
<br/>
 +
=Data Cleaning=
 +
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
 +
|-
 +
! style="font-weight: bold;background: #35383c;color:#FFFFFF;width: 50%;font-family:Trebuchet MS" | METHOD
 +
! style="font-weight: bold;background: #35383c;color:#FFFFFF;font-family:Trebuchet MS" | DESCRIPTION
 +
|-
 +
|
 +
* Stack data to consolidate data table in to 2 columns (Postal Code, Housing Type)
 +
* Remove rows with missing data
 +
||
 +
[[File: Kabakdatacleaning1.png|400px|center]]
 +
|-
 +
|
 +
* Concatenate all 12 months data into one consolidated data table
 +
**By the end of this phase of data cleaning, we have a total of 177,053 rows
 +
||
 +
[[File: Kabakdatacleaning2.png|400px|center]]|-
 +
|-
 +
|
 +
* Merging Private Housing Data with Public Housing Data
 +
**Final consolidated data consist of 241,766 rows
 +
||
 +
[[File: Kabakdatacleaning3.png|400px|center]]
 +
|-
 +
|
 +
* Geocoding of postal codes with missing latitudes and longitude via https://developers.google.com/maps/documentation/geocoding/intro
 +
**Public housing data: 223 missing data
 +
**Private housing data: 338 missing data
 +
||
 +
[[File: GEOCODING.PNG|400px|center]]
 
|}
 
|}
 
<br/>
 
<br/>

Latest revision as of 15:46, 22 November 2016


OVERVIEW

DATA PREPARATION

ANALYSIS


Initial Dataset

DATASET DESCRIPTION DATA USED

Average Monthly Household Electricity Consumption
by Postal Code (Public Housing) & Dwelling Type, 1H & 2H 2015

Link (1H): https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/23RSU.xls

Link (2H): https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/25RSU.xls

  • Average monthly household electricity consumption (kwh)
    • By month
    • By postal code
    • By public housing type
  • Total Average household electricity consumption (kwh)
    • By postal code
    • By public housing type
  • 9379 rows of raw data X 12 sheets = 112,548 rows of raw data

Average Monthly Household Electricity Consumption by Postal Code (Private Apartments), 2015

Link: https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/2RSU.xls

  • Average monthly household electricity consumption (kwh)
    • By month
    • By postal code
  • Total Average household electricity consumption (kwh)
    • By postal code
  • 9911 rows of raw data

Basic Demographics Characteristics (2015)

Link: http://www.singstat.gov.sg/docs/default-source/default-document-library/publications/publications_and_papers/GHS/ghs2015/excel/t7-9.xls

  • Resident Population by Planning Area/Subzone
    • By age group
    • By sex
    • By ethnicity
    • By type of dwelling
  • T7 Age group
    • 378 rows of raw data
  • T8 Ethnicity
    • 378 rows of raw data


Data Cleaning

METHOD DESCRIPTION
  • Stack data to consolidate data table in to 2 columns (Postal Code, Housing Type)
  • Remove rows with missing data
Kabakdatacleaning1.png
  • Concatenate all 12 months data into one consolidated data table
    • By the end of this phase of data cleaning, we have a total of 177,053 rows
Kabakdatacleaning2.png
|-
  • Merging Private Housing Data with Public Housing Data
    • Final consolidated data consist of 241,766 rows
Kabakdatacleaning3.png
GEOCODING.PNG