Kabak: Report Data Preparation

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search


OVERVIEW

DATA PREPARATION

ANALYSIS


Initial Dataset

DATASET DESCRIPTION DATA USED

Average Monthly Household Electricity Consumption
by Postal Code (Public Housing) & Dwelling Type, 1H & 2H 2015

Link (1H): https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/23RSU.xls

Link (2H): https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/25RSU.xls

  • Average monthly household electricity consumption (kwh)
    • By month
    • By postal code
    • By public housing type
  • Total Average household electricity consumption (kwh)
    • By postal code
    • By public housing type
  • 9379 rows of raw data X 12 sheets = 112,548 rows of raw data

Average Monthly Household Electricity Consumption by Postal Code (Private Apartments), 2015

Link: https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/2RSU.xls

  • Average monthly household electricity consumption (kwh)
    • By month
    • By postal code
  • Total Average household electricity consumption (kwh)
    • By postal code
  • 9911 rows of raw data

Basic Demographics Characteristics (2015)

Link: http://www.singstat.gov.sg/docs/default-source/default-document-library/publications/publications_and_papers/GHS/ghs2015/excel/t7-9.xls

  • Resident Population by Planning Area/Subzone
    • By age group
    • By sex
    • By ethnicity
    • By type of dwelling
  • T7 Age group
    • 378 rows of raw data
  • T8 Ethnicity
    • 378 rows of raw data


Data Cleaning

METHOD DESCRIPTION
  • Data cleaning: Household electricity consumption data
    • Stack data to consolidate data table in to 2 columns (Postal Code, Housing Type)
    • Remove rows with missing data
Kabakdatacleaning1.png
  • Data cleaning: Household electricity consumption data
    • Concatenate all 12 months data into one consolidated data table
    • By the end of this phase of data cleaning, we have a total of 177,053 rows
Kabakdatacleaning2.png
  • Data cleaning: Household electricity consumption data
    • Merging Private Housing Data with Public Housing Data
    • Final consolidated data consist of 241,766 rows
Kabakdatacleaning3.png
GEOCODING.PNG
  • Data cleaning: Age, Gender, Ethnicity
    • Delete rows that are empty & blank so at to merge the tables into one data sheet
Kabakdatacleaning4.png
  • Data cleaning: Age, Gender, Ethnicity
    • Consolidate data through Stacking
Kabakdatacleaning5.png