1718t1is428T15

From Visual Analytics for Business Intelligence
Revision as of 19:33, 12 October 2017 by Jiajun.ng.2014 (talk | contribs) (Created page with "300px|center<br /> <!-- Start Nav Bar --> {| style="background-color:#fff; color:#000000 padding: 5px 0 0 0;" width="100%" cellspacing="0" cellpad...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
OnTheFlyLogo.png


PROJECT PROPOSAL

PROJECT POSTER

PROJECT APPLICATION

RESEARCH PAPER


Project Motivation

Experts have warned that power demand is set to double by 2030 globally despite authoritative control. High power consumption can already be observed locally. According Energy Market Authority (EMA), Singapore has faced increasing power consumption from 1965 to 2013 [1].

As Singapore is land-scarce and does not have significant renewable energy options such as hydro-power, wave, or sufficient land for mass solar energy production, energy has been a top concern in the urban nation[2]. It is thus important to promote energy saving concepts to the public as well as deploying energy saving solution island wide. However, the usual analysis tools are not enough to provide a different perspective to facilitate the deployment of the solution. Information about the energy consumption levels of residents in Singapore are often not conveyed adequately enough in geographical terms. While EMA and Singstat provide annual data and reports on energy usage in Singapore, they lack the element of geographical positioning of the data points.


Project Objective

Our team aims to create a web application (Enerlyst) using R that leverages on energy datasets provided by EMA to perform geospatial analysis to identify energy usage clusters. Further analysis can then be performed to identify root causes for high or low energy consumption in these clusters and devise ways to achieve energy conversation as a nation. Project Enerlyst aims to provide a spatial perspective by utilising the following approaches:

  • Choropleth Map
  • Local Moran's I
  • Local Indicators of Spatial Association (LISA)


Technology

System Architecture


R Library

  • shiny
    • Web Application Framework for R
  • maptools
    • Tools for Reading and Handling Spatial Objects
  • rgdal
    • Bindings for the Geospatial Data Abstraction Library
  • leaflet
    • Create Interactive Web Maps with the JavaScript 'Leaflet' Library
  • spatialEco
    • Functions for Kriging and Point Pattern Analysis
  • plyr
    • Tools for Splitting, Applying and Combining Data
  • spdep
    • Spatial Dependence: Weighting Schemes, Statistics and Models
  • GISTools
    • Some further GIS capabilities for R
  • spatstat
    • Spatial Point Pattern Analysis, Model-Fitting, Simulation, Tests
  • classInt
    • Choose Univariate Class Intervals
  • RColorBrewer
    • ColorBrewer Palettes
  • rsconnect
    • Deployment Interface for R Markdown Documents and Shiny Applications
  • openxlsx
    • Read, Write and Edit XLSX Files


Application Features

Uploading and Processing On The Fly

Enerlyst allows the uploading of EMA housing data and process it on the fly. Users are able to view the processed data on the Data tab. After uploading, users are able to select the type of data (residential, private or both). Different year and month are processed and display on the fly when selected. By having this feature, Enerlyst ensures application longevity which allows future datasets to be analysed. The data can be found on EMA website. [3]

Cleaning up of raw data before uploading The data to be uploaded should be in the following format:

  • Geocoded, consisting of X and Y coordinates with column name as "X" and "Y" respectively
  • Row 4(Overall) of the EMA data has to be removed
  • Should follows a naming convention of "YYYY_priv" for private housing data and "YYYY_pub" for public housing data
  • Should follow a file extension of xlsx
  • Merging of two 6 months data into a one year data (only applicable for public housing data)

The steps to convert into a recognisable format by Enerlyst is as follows:

a) Preparing raw data from EMA for 2013 Private Housing

1. Copy out year data into a new excel file

2. Save file as "2013_priv.xlsx"

3. Delete Row 4 which contains the overall energy consumption

4. Add two columns in columns O and P, give them headers named "X" and "Y"

5. Geocode the postal codes and put the results into "X" and "Y"

6. Save the file

b) Preparing raw data from EMA for 2013 Public Housing

1. Open up first half of the public data

2. Open up a new excel file

3. Copy out each month's data into the the file

4. Repeat the steps 1 to 3 for second half of the public data. At the end, there should be 12 sheets in total, in ordered by months from January to December.

In each sheet of the new excel file:

5. Delete Row 4 from each sheet

6. Add two columns in columns O and P, give them headers named "X" and "Y"

7. Geocode the postal codes and put the results into "X" and "Y"

8. Save the file as "2013_pub.xlsx"


Uploading Files to Enerlyst

Once the data files for 2013 to 2015 are ready, we upload them into Enerlyst. The application reads the file’s name, and recognises the year and property type it represents.

For private housing data, the application converts the sheet into a data frame. Whereas for public housing data, the application loops through the 12 sheets (months) of data, aggregating each month’s energy consumption by postal code. In other words, the application finds the total energy consumed by a residential building by totalling consumption of 1-or-2-room, 3-room, 4-room and 5-room/executive apartments. The aggregate is transposed into a data frame, and columns are renamed to show the month. The data frames for private and public housing are similar, and contains the following columns:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Postal Code ║ Jan ║ Feb ║ Mar ║ Apr ║ May ║ Jun ║ Jul ║ Aug ║ Sep ║ Oct ║ Nov ║ Dec ║ X ║ Y

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

After the data frame has been constructed, the application moves on to clean up ‘na’ and ’s’ values, which represent negligible levels of energy consumption and suppressed individual data. These values are replaced with zeroes, and treated as housing with no energy consumption. The application then uses the data frames’ X and Y coordinates to convert the it into a spatial points data frame, and change its reference coordinate system to WGS84.

To allow users to analyse energy consumption clusters by housing types, Enerlyst then moves on to identify which subzones these residential buildings belongs to, and computes 1) private housing’s average energy consumption by subzone, 2) public housing’s average energy consumption by subzone, and 3) combined average energy consumption by subzone. To perform the computation, the following details need to be concluded from the data frames:

  • Total energy consumption of private housing by subzone
  • Total energy consumption of public housing by subzone
  • Total energy consumption of all housing by subzone
  • Count of private residential building per subzone
  • Count of public residential building per subzone
  • Count of all residential building per subzone


Choropleth Map

Using monthly raw data on residential energy consumption from EMA, Enerlyst aggregates the energy consumption by subzone and then find the average consumption per apartment block in each subzone.

Enerlyst provides an overview of each subzone's average energy consumption using three different classification techniques:

  • Natural break Jenks
  • Equal Interval
  • Quantile

Users are able to select different classifications, colors and number of classes using the selecting panel on the left. Changes will be updated dynamically once the user has finalised the selection.


Local Moran's I

Enerlyst provides local auto correlation analysis where hot and cold clusters are identified interms of residential energy consumption. The Local Moran's I's statistic of spatial association for each subzone is given as:

Where (xi - X-bar) is the deviation of subzone's energy consumption with respect ot he mean of its neighbours, and wij is the spatial weight between two subzones, and

with n being the number of subzones in Singapore. Each subzone's neighbour is defined as neighbouring subzones with which it shares a border.

There is also a scatterplot between X and the "spatial lag" of X, formed by averaging all values of X for the neighboring polygons, where X is a subzone's average apartment block energy consumption. The plot identifies which type of spatial autocorrelation exists.


LISA

Extending from Local Moran's I, Enerlyst uses LISA to show each subzone's statistically significant relationship with its neighbors, and show the type of relationship. The quadrants in the plot can be interpreted in the following manner:

  • Top-left quadrant = low-high cluster
  • Top-right quadrant = high-high cluster
  • Bottom-left quadrant = low-low cluster
  • Bottom-right quadrant = high-low cluster


Case Study Analysis

EMA publishes energy statistics on an annual basis to provide readers with a comprehensive understanding of the Singapore energy landscape through a detailed coverage of various energy-related topics. As project Enerlyst focuses on analysing households' energy consumption, only private and public households data will be used. This study will be based on EMA dataset from 2013 to 2015. 2013 data will be prepared manually whereas 2014 and 2015 data will be uploaded to the application and process on the fly.


Choropleth Map

Private Housing



Higher energy consumption can found in the central region.


Sungei Road sub zone has the highest average energy consumption of approximately 2163 kWh.

Public Housing



North-east region has a cluster of sub zones which has a higher energy consumption.


Lower Seletar subzone has the highest average energy consumption of approximately 1024 kWh.

Choropleth maps may seem to be a decent indicator of spatial clustering at a glance. When spatial polygons are of the same color as its neighboring polygons, it may appear to signify a clustering of features based around the attribute of interest. This however, is misleading as the choice of classification method and number of classes specified can result in very different looking choropleth maps. The map creater gets to paint the picture by controlling the variables and thus, the objectivity of the analysis is questionable at best.

Jenks Natural Breaks



Equal Interval



For instance, if we were to look at the choropleth map for energy consumption for the 4 months (March, June, September and December) of 2013, a classification using Jenks Natural Breaks would show that in the central region, in Paterson and Dunearn subzones particularly, they belongs to the grouping of highest energy consumptions visually. However, using a classification of Equal Interval, Paterson and Dunearn are no longer in the grouping of highest energy consumptions visually. Hence, a choropleth map could be misleading despite the attractiveness of the data representation. An analysis such as spatial autocorrelation could be used to provide concrete evidences to spatial clustering.

Local Moran's I

Private Housing



There is a clustering of subzones in the west, central and east region which share the similarity of almost equivalent energy consumption.

Public Housing



The clustering of subzones which share the similarity of almost equivalent energy consumption are in the, west, north-east and east region.


Together with the Local Moran's I, a Moran scatterplot is available to complement the Local Moran's I. It provides an easy way to categorize the nature of spatial autocorrelation into the four classifications which are mainly high-high, high-low, low-low, and low-high. The scatterplot compares the value of the selected variable (x- axis) with its own spatial lagged value (y-axis). This lagged value is derived from the average of the value of the same variable from its neighbors.

LISA

Private Housing



the LISA for the private housing dataset in December 2015 shows that in the west region, Saujana, Jelabu, Dairy Farm and Bangkit subzone have a significant higher energy consumption when compared to the mean of the energy consumption of private housing and the neighbouring subzones are highly similar. For the east region, Bayshore subzone is identified as the higher energy consumption and its neighbouring sub zones such as Siglap shares similar traits.

Public Housing



LISA has proven that the Local Moran's I is accurate as Keat Hong and Hougang East subzone share a higher electricity consumption with its neighbouring subzones.


With such information, energy saving solution can be implemented on the identified subzones to further reduce energy consumption.


Timeline

Week No(s). Task Status
3 Form team Completed
4-5 Discuss and choose a project topic Completed
6-9 Research chosen project topic and data collection Completed
10-11 Create project repository and web application planning Completed
12 Create project wiki Completed
11-15 Develop application Completed
14-15 Research report Completed
16 Finalize application Completed
16 Finalize Poster and Research Paper Completed
16 Prepare for Townhall Presentation Completed
16 Townhall Poster Presentation Completed
16 Final Project Submission Completed


Future Work

  • Allowing analyst to upload industrial energy usage data
  • Performing cluster analysis using point data
  • Including Geary C analysis on top of Local Moran's I


Reference