1718t1is428T15
PROJECT PROPOSAL |
Contents
Project Motivation
Experts have warned that power demand is set to double by 2030 globally despite authoritative control. High power consumption can already be observed locally. According Energy Market Authority (EMA), Singapore has faced increasing power consumption from 1965 to 2013 [1].
As Singapore is land-scarce and does not have significant renewable energy options such as hydro-power, wave, or sufficient land for mass solar energy production, energy has been a top concern in the urban nation[2]. It is thus important to promote energy saving concepts to the public as well as deploying energy saving solution island wide. However, the usual analysis tools are not enough to provide a different perspective to facilitate the deployment of the solution. Information about the energy consumption levels of residents in Singapore are often not conveyed adequately enough in geographical terms. While EMA and Singstat provide annual data and reports on energy usage in Singapore, they lack the element of geographical positioning of the data points.
Project Objective
Our team aims to create a web application (Enerlyst) using R that leverages on energy datasets provided by EMA to perform geospatial analysis to identify energy usage clusters. Further analysis can then be performed to identify root causes for high or low energy consumption in these clusters and devise ways to achieve energy conversation as a nation. Project Enerlyst aims to provide a spatial perspective by utilising the following approaches:
- Choropleth Map
- Local Moran's I
- Local Indicators of Spatial Association (LISA)
Technology
System Architecture
R Library
- shiny
- Web Application Framework for R
- maptools
- Tools for Reading and Handling Spatial Objects
- rgdal
- Bindings for the Geospatial Data Abstraction Library
- leaflet
- Create Interactive Web Maps with the JavaScript 'Leaflet' Library
- spatialEco
- Functions for Kriging and Point Pattern Analysis
- plyr
- Tools for Splitting, Applying and Combining Data
- spdep
- Spatial Dependence: Weighting Schemes, Statistics and Models
- GISTools
- Some further GIS capabilities for R
- spatstat
- Spatial Point Pattern Analysis, Model-Fitting, Simulation, Tests
- classInt
- Choose Univariate Class Intervals
- RColorBrewer
- ColorBrewer Palettes
- rsconnect
- Deployment Interface for R Markdown Documents and Shiny Applications
- openxlsx
- Read, Write and Edit XLSX Files
Application Features
Uploading and Processing On The Fly
Enerlyst allows the uploading of EMA housing data and process it on the fly. Users are able to view the processed data on the Data tab. After uploading, users are able to select the type of data (residential, private or both). Different year and month are processed and display on the fly when selected. By having this feature, Enerlyst ensures application longevity which allows future datasets to be analysed. The data can be found on EMA website. [3]
Cleaning up of raw data before uploading The data to be uploaded should be in the following format:
- Geocoded, consisting of X and Y coordinates with column name as "X" and "Y" respectively
- Row 4(Overall) of the EMA data has to be removed
- Should follows a naming convention of "YYYY_priv" for private housing data and "YYYY_pub" for public housing data
- Should follow a file extension of xlsx
- Merging of two 6 months data into a one year data (only applicable for public housing data)
The steps to convert into a recognisable format by Enerlyst is as follows:
a) Preparing raw data from EMA for 2013 Private Housing
1. Copy out year data into a new excel file
2. Save file as "2013_priv.xlsx"
3. Delete Row 4 which contains the overall energy consumption
4. Add two columns in columns O and P, give them headers named "X" and "Y"
5. Geocode the postal codes and put the results into "X" and "Y"
6. Save the file
b) Preparing raw data from EMA for 2013 Public Housing
1. Open up first half of the public data
2. Open up a new excel file
3. Copy out each month's data into the the file
4. Repeat the steps 1 to 3 for second half of the public data. At the end, there should be 12 sheets in total, in ordered by months from January to December.
In each sheet of the new excel file:
5. Delete Row 4 from each sheet
6. Add two columns in columns O and P, give them headers named "X" and "Y"
7. Geocode the postal codes and put the results into "X" and "Y"
8. Save the file as "2013_pub.xlsx"
Uploading Files to Enerlyst
Once the data files for 2013 to 2015 are ready, we upload them into Enerlyst. The application reads the file’s name, and recognises the year and property type it represents.
For private housing data, the application converts the sheet into a data frame. Whereas for public housing data, the application loops through the 12 sheets (months) of data, aggregating each month’s energy consumption by postal code. In other words, the application finds the total energy consumed by a residential building by totalling consumption of 1-or-2-room, 3-room, 4-room and 5-room/executive apartments. The aggregate is transposed into a data frame, and columns are renamed to show the month. The data frames for private and public housing are similar, and contains the following columns:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Postal Code ║ Jan ║ Feb ║ Mar ║ Apr ║ May ║ Jun ║ Jul ║ Aug ║ Sep ║ Oct ║ Nov ║ Dec ║ X ║ Y
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
After the data frame has been constructed, the application moves on to clean up ‘na’ and ’s’ values, which represent negligible levels of energy consumption and suppressed individual data. These values are replaced with zeroes, and treated as housing with no energy consumption. The application then uses the data frames’ X and Y coordinates to convert the it into a spatial points data frame, and change its reference coordinate system to WGS84.
To allow users to analyse energy consumption clusters by housing types, Enerlyst then moves on to identify which subzones these residential buildings belongs to, and computes 1) private housing’s average energy consumption by subzone, 2) public housing’s average energy consumption by subzone, and 3) combined average energy consumption by subzone. To perform the computation, the following details need to be concluded from the data frames:
- Total energy consumption of private housing by subzone
- Total energy consumption of public housing by subzone
- Total energy consumption of all housing by subzone
- Count of private residential building per subzone
- Count of public residential building per subzone
- Count of all residential building per subzone
Choropleth Map
Using monthly raw data on residential energy consumption from EMA, Enerlyst aggregates the energy consumption by subzone and then find the average consumption per apartment block in each subzone.
Enerlyst provides an overview of each subzone's average energy consumption using three different classification techniques:
- Natural break Jenks
- Equal Interval
- Quantile
Users are able to select different classifications, colors and number of classes using the selecting panel on the left. Changes will be updated dynamically once the user has finalised the selection.
Local Moran's I
Enerlyst provides local auto correlation analysis where hot and cold clusters are identified interms of residential energy consumption. The Local Moran's I's statistic of spatial association for each subzone is given as:
Where (xi - X-bar) is the deviation of subzone's energy consumption with respect ot he mean of its neighbours, and wij is the spatial weight between two subzones, and
with n being the number of subzones in Singapore. Each subzone's neighbour is defined as neighbouring subzones with which it shares a border.
There is also a scatterplot between X and the "spatial lag" of X, formed by averaging all values of X for the neighboring polygons, where X is a subzone's average apartment block energy consumption. The plot identifies which type of spatial autocorrelation exists.
LISA
Extending from Local Moran's I, Enerlyst uses LISA to show each subzone's statistically significant relationship with its neighbors, and show the type of relationship. The quadrants in the plot can be interpreted in the following manner:
- Top-left quadrant = low-high cluster
- Top-right quadrant = high-high cluster
- Bottom-left quadrant = low-low cluster
- Bottom-right quadrant = high-low cluster
Case Study Analysis
EMA publishes energy statistics on an annual basis to provide readers with a comprehensive understanding of the Singapore energy landscape through a detailed coverage of various energy-related topics. As project Enerlyst focuses on analysing households' energy consumption, only private and public households data will be used. This study will be based on EMA dataset from 2013 to 2015. 2013 data will be prepared manually whereas 2014 and 2015 data will be uploaded to the application and process on the fly.
Choropleth Map
Higher energy consumption can found in the central region.
Sungei Road sub zone has the highest average energy consumption of approximately 2163 kWh.
North-east region has a cluster of sub zones which has a higher energy consumption.
Lower Seletar subzone has the highest average energy consumption of approximately 1024 kWh.
Choropleth maps may seem to be a decent indicator of spatial clustering at a glance. When spatial polygons are of the same color as its neighboring polygons, it may appear to signify a clustering of features based around the attribute of interest. This however, is misleading as the choice of classification method and number of classes specified can result in very different looking choropleth maps. The map creater gets to paint the picture by controlling the variables and thus, the objectivity of the analysis is questionable at best.
For instance, if we were to look at the choropleth map for energy consumption for the 4 months (March, June, September and December) of 2013, a classification using Jenks Natural Breaks would show that in the central region, in Paterson and Dunearn subzones particularly, they belongs to the grouping of highest energy consumptions visually. However, using a classification of Equal Interval, Paterson and Dunearn are no longer in the grouping of highest energy consumptions visually. Hence, a choropleth map could be misleading despite the attractiveness of the data representation. An analysis such as spatial autocorrelation could be used to provide concrete evidences to spatial clustering.
Local Moran's I
There is a clustering of subzones in the west, central and east region which share the similarity of almost equivalent energy consumption.
The clustering of subzones which share the similarity of almost equivalent energy consumption are in the, west, north-east and east region.
Together with the Local Moran's I, a Moran scatterplot is available to complement the Local Moran's I. It provides an easy way to categorize the nature of spatial autocorrelation into the four classifications which are mainly high-high, high-low, low-low, and low-high. The scatterplot compares the value of the selected variable (x- axis) with its own spatial lagged value (y-axis). This lagged value is derived from the average of the value of the same variable from its neighbors.
LISA
the LISA for the private housing dataset in December 2015 shows that in the west region, Saujana, Jelabu, Dairy Farm and Bangkit subzone have a significant higher energy consumption when compared to the mean of the energy consumption of private housing and the neighbouring subzones are highly similar. For the east region, Bayshore subzone is identified as the higher energy consumption and its neighbouring sub zones such as Siglap shares similar traits.
LISA has proven that the Local Moran's I is accurate as Keat Hong and Hougang East subzone share a higher electricity consumption with its neighbouring subzones.
With such information, energy saving solution can be implemented on the identified subzones to further reduce energy consumption.
Timeline
Week No(s). | Task | Status |
---|---|---|
3 | Form team | Completed |
4-5 | Discuss and choose a project topic | Completed |
6-9 | Research chosen project topic and data collection | Completed |
10-11 | Create project repository and web application planning | Completed |
12 | Create project wiki | Completed |
11-15 | Develop application | Completed |
14-15 | Research report | Completed |
16 | Finalize application | Completed |
16 | Finalize Poster and Research Paper | Completed |
16 | Prepare for Townhall Presentation | Completed |
16 | Townhall Poster Presentation | Completed |
16 | Final Project Submission | Completed |
Future Work
- Allowing analyst to upload industrial energy usage data
- Performing cluster analysis using point data
- Including Geary C analysis on top of Local Moran's I