1718t1is428T15

From Visual Analytics for Business Intelligence

Revision as of 19:33, 12 October 2017 by Jiajun.ng.2014 (talk | contribs) (Created page with "300px|center<br />  {| style="background-color:#fff; color:#000000 padding: 5px 0 0 0;" width="100%" cellspacing="0" cellpad...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

PROJECT PROPOSAL

PROJECT POSTER

PROJECT APPLICATION

RESEARCH PAPER

Contents

1 Project Motivation
2 Project Objective
3 Technology
- 3.1 System Architecture
- 3.2 R Library
4 Application Features
5 Case Study Analysis
6 Timeline
7 Future Work
8 Reference

Project Motivation

Experts have warned that power demand is set to double by 2030 globally despite authoritative control. High power consumption can already be observed locally. According Energy Market Authority (EMA), Singapore has faced increasing power consumption from 1965 to 2013 ^[1].

As Singapore is land-scarce and does not have significant renewable energy options such as hydro-power, wave, or sufficient land for mass solar energy production, energy has been a top concern in the urban nation^[2]. It is thus important to promote energy saving concepts to the public as well as deploying energy saving solution island wide. However, the usual analysis tools are not enough to provide a different perspective to facilitate the deployment of the solution. Information about the energy consumption levels of residents in Singapore are often not conveyed adequately enough in geographical terms. While EMA and Singstat provide annual data and reports on energy usage in Singapore, they lack the element of geographical positioning of the data points.

Project Objective

Our team aims to create a web application (Enerlyst) using R that leverages on energy datasets provided by EMA to perform geospatial analysis to identify energy usage clusters. Further analysis can then be performed to identify root causes for high or low energy consumption in these clusters and devise ways to achieve energy conversation as a nation. Project Enerlyst aims to provide a spatial perspective by utilising the following approaches:

Choropleth Map

Local Moran's I

Local Indicators of Spatial Association (LISA)

Technology

System Architecture

R Library

shiny
- Web Application Framework for R
maptools
- Tools for Reading and Handling Spatial Objects
rgdal
- Bindings for the Geospatial Data Abstraction Library
leaflet
- Create Interactive Web Maps with the JavaScript 'Leaflet' Library
spatialEco
- Functions for Kriging and Point Pattern Analysis
plyr
- Tools for Splitting, Applying and Combining Data
spdep
- Spatial Dependence: Weighting Schemes, Statistics and Models
GISTools
- Some further GIS capabilities for R
spatstat
- Spatial Point Pattern Analysis, Model-Fitting, Simulation, Tests
classInt
- Choose Univariate Class Intervals
RColorBrewer
- ColorBrewer Palettes
rsconnect
- Deployment Interface for R Markdown Documents and Shiny Applications
openxlsx
- Read, Write and Edit XLSX Files

Application Features

Uploading and Processing On The Fly

Enerlyst allows the uploading of EMA housing data and process it on the fly. Users are able to view the processed data on the Data tab. After uploading, users are able to select the type of data (residential, private or both). Different year and month are processed and display on the fly when selected. By having this feature, Enerlyst ensures application longevity which allows future datasets to be analysed. The data can be found on EMA website. ^[3]

Cleaning up of raw data before uploading The data to be uploaded should be in the following format:

Geocoded, consisting of X and Y coordinates with column name as "X" and "Y" respectively
Row 4(Overall) of the EMA data has to be removed
Should follows a naming convention of "YYYY_priv" for private housing data and "YYYY_pub" for public housing data
Should follow a file extension of xlsx
Merging of two 6 months data into a one year data (only applicable for public housing data)

The steps to convert into a recognisable format by Enerlyst is as follows:

a) Preparing raw data from EMA for 2013 Private Housing

1. Copy out year data into a new excel file

2. Save file as "2013_priv.xlsx"

3. Delete Row 4 which contains the overall energy consumption

4. Add two columns in columns O and P, give them headers named "X" and "Y"

5. Geocode the postal codes and put the results into "X" and "Y"

6. Save the file

b) Preparing raw data from EMA for 2013 Public Housing

1. Open up first half of the public data

2. Open up a new excel file

3. Copy out each month's data into the the file

4. Repeat the steps 1 to 3 for second half of the public data. At the end, there should be 12 sheets in total, in ordered by months from January to December.

In each sheet of the new excel file:

5. Delete Row 4 from each sheet

6. Add two columns in columns O and P, give them headers named "X" and "Y"

7. Geocode the postal codes and put the results into "X" and "Y"

8. Save the file as "2013_pub.xlsx"

Uploading Files to Enerlyst

Once the data files for 2013 to 2015 are ready, we upload them into Enerlyst. The application reads the file’s name, and recognises the year and property type it represents.

For private housing data, the application converts the sheet into a data frame. Whereas for public housing data, the application loops through the 12 sheets (months) of data, aggregating each month’s energy consumption by postal code. In other words, the application finds the total energy consumed by a residential building by totalling consumption of 1-or-2-room, 3-room, 4-room and 5-room/executive apartments. The aggregate is transposed into a data frame, and columns are renamed to show the month. The data frames for private and public housing are similar, and contains the following columns:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Postal Code ║ Jan ║ Feb ║ Mar ║ Apr ║ May ║ Jun ║ Jul ║ Aug ║ Sep ║ Oct ║ Nov ║ Dec ║ X ║ Y

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

After the data frame has been constructed, the application moves on to clean up ‘na’ and ’s’ values, which represent negligible levels of energy consumption and suppressed individual data. These values are replaced with zeroes, and treated as housing with no energy consumption. The application then uses the data frames’ X and Y coordinates to convert the it into a spatial points data frame, and change its reference coordinate system to WGS84.

To allow users to analyse energy consumption clusters by housing types, Enerlyst then moves on to identify which subzones these residential buildings belongs to, and computes 1) private housing’s average energy consumption by subzone, 2) public housing’s average energy consumption by subzone, and 3) combined average energy consumption by subzone. To perform the computation, the following details need to be concluded from the data frames:

Total energy consumption of private housing by subzone
Total energy consumption of public housing by subzone
Total energy consumption of all housing by subzone
Count of private residential building per subzone
Count of public residential building per subzone
Count of all residential building per subzone

Choropleth Map

Using monthly raw data on residential energy consumption from EMA, Enerlyst aggregates the energy consumption by subzone and then find the average consumption per apartment block in each subzone.

Enerlyst provides an overview of each subzone's average energy consumption using three different classification techniques:

Natural break Jenks
Equal Interval
Quantile

Users are able to select different classifications, colors and number of classes using the selecting panel on the left. Changes will be updated dynamically once the user has finalised the selection.

Local Moran's I

Enerlyst provides local auto correlation analysis where hot and cold clusters are identified interms of residential energy consumption. The Local Moran's I's statistic of spatial association for each subzone is given as:

Where (x_i - X-bar) is the deviation of subzone's energy consumption with respect ot he mean of its neighbours, and w_ij is the spatial weight between two subzones, and

with n being the number of subzones in Singapore. Each subzone's neighbour is defined as neighbouring subzones with which it shares a border.

There is also a scatterplot between X and the "spatial lag" of X, formed by averaging all values of X for the neighboring polygons, where X is a subzone's average apartment block energy consumption. The plot identifies which type of spatial autocorrelation exists.

LISA

Extending from Local Moran's I, Enerlyst uses LISA to show each subzone's statistically significant relationship with its neighbors, and show the type of relationship. The quadrants in the plot can be interpreted in the following manner:

Top-left quadrant = low-high cluster
Top-right quadrant = high-high cluster
Bottom-left quadrant = low-low cluster
Bottom-right quadrant = high-low cluster

Case Study Analysis

EMA publishes energy statistics on an annual basis to provide readers with a comprehensive understanding of the Singapore energy landscape through a detailed coverage of various energy-related topics. As project Enerlyst focuses on analysing households' energy consumption, only private and public households data will be used. This study will be based on EMA dataset from 2013 to 2015. 2013 data will be prepared manually whereas 2014 and 2015 data will be uploaded to the application and process on the fly.

Choropleth Map

Private Housing

Higher energy consumption can found in the central region.

Sungei Road sub zone has the highest average energy consumption of approximately 2163 kWh.

Public Housing

North-east region has a cluster of sub zones which has a higher energy consumption.

Lower Seletar subzone has the highest average energy consumption of approximately 1024 kWh.

Choropleth maps may seem to be a decent indicator of spatial clustering at a glance. When spatial polygons are of the same color as its neighboring polygons, it may appear to signify a clustering of features based around the attribute of interest. This however, is misleading as the choice of classification method and number of classes specified can result in very different looking choropleth maps. The map creater gets to paint the picture by controlling the variables and thus, the objectivity of the analysis is questionable at best.

Jenks Natural Breaks

Equal Interval

For instance, if we were to look at the choropleth map for energy consumption for the 4 months (March, June, September and December) of 2013, a classification using Jenks Natural Breaks would show that in the central region, in Paterson and Dunearn subzones particularly, they belongs to the grouping of highest energy consumptions visually. However, using a classification of Equal Interval, Paterson and Dunearn are no longer in the grouping of highest energy consumptions visually. Hence, a choropleth map could be misleading despite the attractiveness of the data representation. An analysis such as spatial autocorrelation could be used to provide concrete evidences to spatial clustering.

Local Moran's I

Private Housing

There is a clustering of subzones in the west, central and east region which share the similarity of almost equivalent energy consumption.

Public Housing

The clustering of subzones which share the similarity of almost equivalent energy consumption are in the, west, north-east and east region.

Together with the Local Moran's I, a Moran scatterplot is available to complement the Local Moran's I. It provides an easy way to categorize the nature of spatial autocorrelation into the four classifications which are mainly high-high, high-low, low-low, and low-high. The scatterplot compares the value of the selected variable (x- axis) with its own spatial lagged value (y-axis). This lagged value is derived from the average of the value of the same variable from its neighbors.

LISA

Private Housing

the LISA for the private housing dataset in December 2015 shows that in the west region, Saujana, Jelabu, Dairy Farm and Bangkit subzone have a significant higher energy consumption when compared to the mean of the energy consumption of private housing and the neighbouring subzones are highly similar. For the east region, Bayshore subzone is identified as the higher energy consumption and its neighbouring sub zones such as Siglap shares similar traits.

Public Housing

LISA has proven that the Local Moran's I is accurate as Keat Hong and Hougang East subzone share a higher electricity consumption with its neighbouring subzones.

With such information, energy saving solution can be implemented on the identified subzones to further reduce energy consumption.

Timeline

Week No(s).	Task	Status
3	Form team	Completed
4-5	Discuss and choose a project topic	Completed
6-9	Research chosen project topic and data collection	Completed
10-11	Create project repository and web application planning	Completed
12	Create project wiki	Completed
11-15	Develop application	Completed
14-15	Research report	Completed
16	Finalize application	Completed
16	Finalize Poster and Research Paper	Completed
16	Prepare for Townhall Presentation	Completed
16	Townhall Poster Presentation	Completed
16	Final Project Submission	Completed

Future Work

Allowing analyst to upload industrial energy usage data
Performing cluster analysis using point data
Including Geary C analysis on top of Local Moran's I

Reference

Retrieved from ‘https://wiki.smu.edu.sg/1718t1is428g1/index.php?title=1718t1is428T15&oldid=3296’

Pages with broken file links