Difference between revisions of "Kabak: Report Data Preparation"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
 
(One intermediate revision by the same user not shown)
Line 88: Line 88:
 
|-
 
|-
 
|  
 
|  
* Stack data to consolidate data table in to 2 columns (Postal Code, Housing Type)  
+
* Data cleaning: Household electricity consumption data
* Remove rows with missing data
+
**Stack data to consolidate data table in to 2 columns (Postal Code, Housing Type)  
 +
**Remove rows with missing data
 
  ||  
 
  ||  
 
[[File: Kabakdatacleaning1.png|400px|center]]
 
[[File: Kabakdatacleaning1.png|400px|center]]
 
|-
 
|-
 
|
 
|
* Concatenate all 12 months data into one consolidated data table
+
* Data cleaning: Household electricity consumption data
 +
**Concatenate all 12 months data into one consolidated data table
 
**By the end of this phase of data cleaning, we have a total of 177,053 rows
 
**By the end of this phase of data cleaning, we have a total of 177,053 rows
 
||
 
||
Line 100: Line 102:
 
|-
 
|-
 
|
 
|
* Merging Private Housing Data with Public Housing Data
+
* Data cleaning: Household electricity consumption data
 +
**Merging Private Housing Data with Public Housing Data
 
**Final consolidated data consist of 241,766 rows  
 
**Final consolidated data consist of 241,766 rows  
 
||
 
||
Line 106: Line 109:
 
|-
 
|-
 
|
 
|
* Geocoding of postal codes with missing latitudes and longitude via https://developers.google.com/maps/documentation/geocoding/intro
+
* Geocoding: Postal codes with missing latitudes and longitude via https://developers.google.com/maps/documentation/geocoding/intro
 
**Public housing data: 223 missing data
 
**Public housing data: 223 missing data
 
**Private housing data: 338 missing data
 
**Private housing data: 338 missing data
 
||
 
||
 
[[File: GEOCODING.PNG|400px|center]]
 
[[File: GEOCODING.PNG|400px|center]]
 +
|-
 +
|
 +
* Data cleaning: Age, Gender, Ethnicity
 +
**Delete rows that are empty & blank so at to merge the tables into one data sheet
 +
||
 +
[[File: Kabakdatacleaning4.png|400px|center]]
 +
|-
 +
|
 +
* Data cleaning: Age, Gender, Ethnicity
 +
** Consolidate data through Stacking
 +
||
 +
[[File: Kabakdatacleaning5.png|400px|center]]
 
|}
 
|}
 
<br/>
 
<br/>

Latest revision as of 16:58, 22 November 2016


OVERVIEW

DATA PREPARATION

ANALYSIS


Initial Dataset

DATASET DESCRIPTION DATA USED

Average Monthly Household Electricity Consumption
by Postal Code (Public Housing) & Dwelling Type, 1H & 2H 2015

Link (1H): https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/23RSU.xls

Link (2H): https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/25RSU.xls

  • Average monthly household electricity consumption (kwh)
    • By month
    • By postal code
    • By public housing type
  • Total Average household electricity consumption (kwh)
    • By postal code
    • By public housing type
  • 9379 rows of raw data X 12 sheets = 112,548 rows of raw data

Average Monthly Household Electricity Consumption by Postal Code (Private Apartments), 2015

Link: https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/2RSU.xls

  • Average monthly household electricity consumption (kwh)
    • By month
    • By postal code
  • Total Average household electricity consumption (kwh)
    • By postal code
  • 9911 rows of raw data

Basic Demographics Characteristics (2015)

Link: http://www.singstat.gov.sg/docs/default-source/default-document-library/publications/publications_and_papers/GHS/ghs2015/excel/t7-9.xls

  • Resident Population by Planning Area/Subzone
    • By age group
    • By sex
    • By ethnicity
    • By type of dwelling
  • T7 Age group
    • 378 rows of raw data
  • T8 Ethnicity
    • 378 rows of raw data


Data Cleaning

METHOD DESCRIPTION
  • Data cleaning: Household electricity consumption data
    • Stack data to consolidate data table in to 2 columns (Postal Code, Housing Type)
    • Remove rows with missing data
Kabakdatacleaning1.png
  • Data cleaning: Household electricity consumption data
    • Concatenate all 12 months data into one consolidated data table
    • By the end of this phase of data cleaning, we have a total of 177,053 rows
Kabakdatacleaning2.png
  • Data cleaning: Household electricity consumption data
    • Merging Private Housing Data with Public Housing Data
    • Final consolidated data consist of 241,766 rows
Kabakdatacleaning3.png
GEOCODING.PNG
  • Data cleaning: Age, Gender, Ethnicity
    • Delete rows that are empty & blank so at to merge the tables into one data sheet
Kabakdatacleaning4.png
  • Data cleaning: Age, Gender, Ethnicity
    • Consolidate data through Stacking
Kabakdatacleaning5.png