Difference between revisions of "1718t1is428T15"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
(Created page with "300px|center<br /> <!-- Start Nav Bar --> {| style="background-color:#fff; color:#000000 padding: 5px 0 0 0;" width="100%" cellspacing="0" cellpad...")
 
 
(79 intermediate revisions by 3 users not shown)
Line 3: Line 3:
 
<!-- Start Nav Bar -->
 
<!-- Start Nav Bar -->
 
{| style="background-color:#fff; color:#000000 padding: 5px 0 0 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" |
 
{| style="background-color:#fff; color:#000000 padding: 5px 0 0 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" |
| style="font-family:Century Gothic; font-size:100%; background:#778899; text-align:center; border-left: 0px;"  width="25%";|
+
| style="font-family:Century Gothic; font-size:100%; background:#05050f; text-align:center; border-left: 0px;"  width="25%";|
 
<font color="#FFFFFF" size="2"><strong>PROJECT PROPOSAL</strong></font>
 
<font color="#FFFFFF" size="2"><strong>PROJECT PROPOSAL</strong></font>
| style="font-family:Century Gothic; font-size:100%; background:#708090; text-align:center; border-left: 0px" width="25%" |  
+
| style="font-family:Century Gothic; font-size:100%; background:#132039; text-align:center; border-left: 0px" width="25%" |  
[[IS415 Team wiki: 2017T2 On The Fly Project Poster|<font color="#FFFFFF" size="2"><strong>PROJECT POSTER</strong></font>]]
+
[[1718t1is428T15 Poster|<font color="#FFFFFF" size="2"><strong>PROJECT POSTER</strong></font>]]
| style="font-family:Century Gothic; font-size:100%; background:#778899; text-align:center;border-left: 0px" width="25%" |  
+
| style="font-family:Century Gothic; font-size:100%; background:#132039; text-align:center;border-left: 0px" width="25%" |  
[[IS415 Team wiki: 2017T2 On The Fly Project Application|<font color="#FFFFFF" size="2"><strong>PROJECT APPLICATION</strong></font>]]
+
[[1718t1is428T15 Application|<font color="#FFFFFF" size="2"><strong>PROJECT APPLICATION</strong></font>]]
| style="font-family:Century Gothic; font-size:100%; background:#708090; text-align:center; border-left: 0px" width="25%" |  
+
| style="font-family:Century Gothic; font-size:100%; background:#132039; text-align:center; border-left: 0px" width="25%" |  
[[IS415 Team wiki: 2017T2 On The Fly Research Paper|<font color="#FFFFFF" size="2"><strong>RESEARCH PAPER</strong></font>]]
+
[[1718t1is428T15 Research Paper|<font color="#FFFFFF" size="2"><strong>RESEARCH PAPER</strong></font>]]
 
|}
 
|}
 
<br />
 
<br />
 
<!-- End Nav Bar -->
 
<!-- End Nav Bar -->
  
<!-- START PROJECT MOTIVATION -->
+
<!-- START PROJECT MOTIVATION & OBJECTIVE -->
==<div style="margin-top: 10px;font-family: Helvetica; text-align: left;font-size:20px; border: 5px solid #00000000; border-radius:5px; text-align:center; background-color: #708090; color: white; padding: 2px"><span style="font-size:24px;">P</span>roject <span style="font-size:24px">M</span>otivation</div>==
+
==<div style="margin-top: 10px;font-family: Helvetica; text-align: left;font-size:20px; border: 5px solid #00000000; border-radius:5px; text-align:center; background-color: #132039; color: white; padding: 2px"><span style="font-size:24px;">P</span>roject <span style="font-size:24px">M</span>otivation & <span style="font-size:24px">O</span>bjective</div>==
  
Experts have warned that power demand is set to double by 2030 globally despite authoritative control. High power consumption can already be observed locally. According Energy Market Authority (EMA), Singapore has faced increasing power consumption from 1965 to 2013 <ref>https://www.ema.gov.sg/Publications_Annual_Reports.aspx</ref>.  
+
Experts have warned that power demand is set to double by 2030 globally despite authoritative control. High power consumption can already be observed locally. According to the Energy Market Authority (EMA), Singapore has faced increasing power consumption from 1965 to 2013 <ref>https://www.ema.gov.sg/Publications_Annual_Reports.aspx</ref>.  
[[File:IS415-Group2-OnTheFly-EMA.png|700px|center]]
+
[[File:1718t1is428T15-Motivation.png|700px|center]]
As Singapore is land-scarce and does not have significant renewable energy options such as hydro-power, wave, or sufficient land for mass solar energy production, energy has been a top concern in the urban nation<ref>http://www.eco-business.com/news/tackling-energy-challenges-the-singapore-way/</ref>. It is thus important to promote energy saving concepts to the public as well as deploying energy saving solution island wide. However, the usual analysis tools are not enough to provide a different perspective to facilitate the deployment of the solution. Information about the energy consumption levels of residents in Singapore are often not conveyed adequately enough in geographical terms. While EMA and Singstat provide annual data and reports on energy usage in Singapore, they lack the element of geographical positioning of the data points.  
+
As Singapore is land-scarce and does not have significant renewable energy options such as hydro-power, wave, or sufficient land for mass solar energy production, energy has been a top concern in the urban nation<ref>http://www.eco-business.com/news/tackling-energy-challenges-the-singapore-way/</ref>. It is thus important to promote energy saving concepts to the public as well as deploying energy saving solution island wide. However, the usual analysis tools are not enough to provide a different perspective to facilitate the deployment of the solution. Information about the energy consumption levels of residents in Singapore are often not conveyed adequately enough in data visualisation. While EMA and Singstat provide annual data and reports on energy usage in Singapore, a powerful visualisation technique should be used to gain insights effectively. Our team aims to create a visualisation that leverages on energy datasets provided by EMA to perform spatial analysis to identify energy usage clusters with hexagonal binning.
<!-- END PROJECT MOTIVATION -->
 
  
<!-- START PROJECT OBJECTIVE -->
+
<!-- END PROJECT MOTIVATION & OBJECTIVE -->
  
==<div style="margin-top: 10px;font-family: Helvetica; text-align: left;font-size:20px; border: 5px solid #00000000; border-radius:5px; text-align:center; background-color: #708090; color: white; padding: 2px"><span style="font-size:24px;">P</span>roject <span style="font-size:24px">O</span>bjective</div>==
+
<!-- START DATASET-->
Our team aims to create a web application (Enerlyst) using R that leverages on energy datasets provided by EMA to perform geospatial analysis to identify energy usage clusters. Further analysis can then be performed to identify root causes for high or low energy consumption in these clusters and devise ways to achieve energy conversation as a nation. Project Enerlyst aims to provide a spatial perspective by utilising the following approaches: <br /><br />
 
  
*Choropleth Map
+
==<div style="margin-top: 10px;font-family: Helvetica; text-align: left;font-size:20px; border: 5px solid #00000000; border-radius:5px; text-align:center; background-color: #132039; color: white; padding: 2px"><span style="font-size:24px;">D</span>atasets</div>==
  
*Local Moran's I
+
=== Data Source ===
 +
Our analysis will be based on EMA's collection of data on Singapore's residential electricity consumption<ref>https://www.ema.gov.sg/Statistics.aspx</ref>:
 +
* Public housing's monthly average household electricity consumption (kwh) (2013 - 2015)
 +
* Private apartment's monthly average household electricity consumption (kwh) (2013 - 2015)
  
*Local Indicators of Spatial Association (LISA)
+
=== Data Attributes ===
<!-- END PROJECT OBJECTIVE -->
 
  
<!-- START TECHNOLOGY-->
+
==== Public Housing ====
  
==<div style="margin-top: 10px;font-family: Helvetica; text-align: left;font-size:20px; border: 5px solid #00000000; border-radius:5px; text-align:center; background-color: #708090; color: white; padding: 2px"><span style="font-size:24px;">T</span>echnology</div>==
+
The dataset for each year is split into two excel workbooks, each containing six sheets representing each month's data as shown below:
  
<!-- START SYSTEM ARCHITECTURE -->
+
[[File:1718t1is428T15-DataSourcePublic.png|700px|center]]
  
===<div font-family: Century Gothic; padding: 0px 30px 0px 18px;"><span style="font-size:20px;"> System Architecture</span></div>===
+
The following is a snapshot of Jan 2015's electricity consumption data, and a description of the data attributes collected for each month:
[[File:IS415-OnTheFly-SA.png|400px|center]]
 
<!-- END SYSTEM ARCHITECTURE -->
 
  
<!-- START R LIBRARY -->
+
[[File:1718t1is428T15-DataSourcePublicJan.png|400px|center]]
  
===<div font-family: Century Gothic; padding: 0px 30px 0px 18px;"><span style="font-size:20px;"> R Library</span></div>===
+
<center>
*shiny
+
{| class="wikitable"
**Web Application Framework for R
+
|-
*maptools
+
! Attribute
**Tools for Reading and Handling Spatial Objects
+
! Description
*rgdal
+
|-
**Bindings for the Geospatial Data Abstraction Library
+
| Postal Code
*leaflet
+
| Postal code of a public residential building
**Create Interactive Web Maps with the JavaScript 'Leaflet' Library
+
|-
*spatialEco
+
| 1-room / 2-room
**Functions for Kriging and Point Pattern Analysis
+
| Average electricity consumed by 1-room/2-room flats in the building
*plyr
+
|-
**Tools for Splitting, Applying and Combining Data
+
| 3-room
*spdep
+
| Average electricity consumed by 3-room flats in the building
**Spatial Dependence: Weighting Schemes, Statistics and Models
+
|-
*GISTools
+
| 4-room
**Some further GIS capabilities for R
+
| Average electricity consumed by 4-room flats in the building
*spatstat
+
|-
**Spatial Point Pattern Analysis, Model-Fitting, Simulation, Tests
+
| 5-room / Executive
*classInt
+
| Average electricity consumed by 5-room/executive flats in the building
**Choose Univariate Class Intervals
+
|}
*RColorBrewer
+
</center>
**ColorBrewer Palettes
 
*rsconnect
 
**Deployment Interface for R Markdown Documents and Shiny Applications
 
*openxlsx
 
**Read, Write and Edit XLSX Files
 
<!-- END R LIBRARY -->
 
  
<!-- END TECHNOLOGY-->
+
==== Private Housing ====
  
<!-- START APPLICATION FEATURES-->
+
The datasets for multiple years are split into one excel workbook, each sheet representing each year's data as shown below:
  
==<div style="margin-top: 10px;font-family: Helvetica; text-align: left;font-size:20px; border: 5px solid #00000000; border-radius:5px; text-align:center; background-color: #708090; color: white; padding: 2px"><span style="font-size:24px;">A</span>pplication <span style="font-size:24px">F</span>eatures</div>==
+
[[File:1718t1is428T15-DataSourcePrivate.png|400px|center]]
  
<!-- START UPLOADING -->
+
Each year's data contains the following attributes:
  
===<div font-family: Century Gothic; padding: 0px 30px 0px 18px;"><span style="font-size:20px;">Uploading and Processing On The Fly</span></div>===
+
<center>
 +
{| class="wikitable"
 +
|-
 +
! Attribute
 +
! Description
 +
|-
 +
| Postal Code
 +
| Postal code of a private residential building
 +
|-
 +
| Jan
 +
| Average electricity consumed by all apartments in the building in Jan
 +
|-
 +
| Feb
 +
| Average electricity consumed by all apartments in the building in Feb
 +
|-
 +
| Mar
 +
| Average electricity consumed by all apartments in the building in Mar
 +
|-
 +
| Apr
 +
| Average electricity consumed by all apartments in the building in Apr
 +
|-
 +
| May
 +
| Average electricity consumed by all apartments in the building in May
 +
|-
 +
| Jun
 +
| Average electricity consumed by all apartments in the building in Jun
 +
|-
 +
| Jul
 +
| Average electricity consumed by all apartments in the building in Jul
 +
|-
 +
| Aug
 +
| Average electricity consumed by all apartments in the building in Aug
 +
|-
 +
| Sep
 +
| Average electricity consumed by all apartments in the building in Sep
 +
|-
 +
| Oct
 +
| Average electricity consumed by all apartments in the building in Oct
 +
|-
 +
| Nov
 +
| Average electricity consumed by all apartments in the building in Nov
 +
|-
 +
| Dec
 +
| Average electricity consumed by all apartments in the building in Dec
 +
|}
 +
</center>
  
<b>Enerlyst</b> allows the uploading of EMA housing data and process it on the fly.  Users are able to view the processed data on the Data tab. After uploading, users are able to select the type of data (residential, private or both). Different year and month are processed and display on the fly when selected. By having this feature, Enerlyst ensures application longevity which allows future datasets to be analysed. The data can be found on EMA website. <ref>https://www.ema.gov.sg/Statistics.aspx</ref><br /><br />
+
<!-- END DATASET-->
  
<b>Cleaning up of raw data before uploading</b>
+
<!-- START RELATED WORKS-->
The data to be uploaded should be in the following format:
 
*Geocoded, consisting of X and Y coordinates with column name as "X" and "Y" respectively
 
*Row 4(Overall) of the EMA data has to be removed
 
*Should follows a naming convention of "YYYY_priv" for private housing data and "YYYY_pub" for public housing data
 
*Should follow a file extension of ''xlsx''
 
*Merging of two 6 months data into a one year data (only applicable for public housing data) <br /><br />
 
  
The steps to convert into a recognisable format by Enerlyst is as follows:
+
==<div style="margin-top: 10px;font-family: Helvetica; text-align: left;font-size:20px; border: 5px solid #00000000; border-radius:5px; text-align:center; background-color: #132039; color: white; padding: 2px"><span style="font-size:24px;">R</span>elated <span style="font-size:24px">W</span>orks</div>==
  
<u>a) Preparing raw data from EMA for 2013 Private Housing</u>
+
Much of the relevant prior work on residential energy consumption levels in Singapore revolve around the motivations and barriers towards energy efficiency.
  
[[File:IS415-DataPrepPriv1.png|700px|center]]
+
In 2013, the  Ministry of the Environment and Water Resources (MEWR) interviewed 2,500 residents on their extent of energy efficiency practice at home, level of awareness of energy efficiency, and barriers towards being energy efficient. It found that 41.3% of the respondents are more encouraged to conserve electricity if the government were to provide monetary incentives or voucher rewards/rebates, and 36.5% are motivated by advertisements on various media platforms. The findings also concluded that residents generally perceived the high cost of energy-efficient appliances and inconvenience of energy-saving practices as barriers to energy efficiency in households.<ref>https://www.mewr.gov.sg/docs/default-source/default-document-library/grab-our-research/mewr_ee_report.pdf</ref>
  
1. Copy out year data into a new excel file
+
Another research by Energy Efficient Singapore (E2 Singapore) indicated that when residents in other countries are allowed to compare their utility bills against that of their neighbours, they can potentially achieve 4 to 12% energy savings. This is because it leverages on the power of social norms to provide direct feedback to the residents – residents are likely to bring their behavior closer to the norm when they are informed of what the norm is.<ref>http://www.e2singapore.gov.sg/DATA/0/docs/NewsFiles/Find%20out%20how%20much%20your%20neighbours%20spend%20on%20energy_v2.pdf</ref>
  
[[File:IS415-DataPrepPriv2.png|700px|center]]
+
A third by Xu and Ang from NUS analyzes the root cause of high energy consumption using the index decomposition analysis (IDA). The IDA model studies changes in energy consumption over time and is often used in major energy consuming sectors such as the transport industry. To fit the model for use on the residential sector, Xu and Ang applied a hybrid IDA model that divides the residential sector into various subsectors, each with a different key factor driving energy consumption. For instance, energy consumption in a subsector may be driven by floor area (for air cooling and heating). The paper found that environment control and household appliances are the main factors for energy consumption by households, and each of these is greatly affected by population growth and decreases in residents per household.<ref>http://www.e2singapore.gov.sg/DATA/0/docs/1-s2.0-S0306261913006193-main.pdf</ref>
  
2. Save file as "2013_priv.xlsx"
+
By using our proposed work jointly with the first two papers, users can visually identify clusters with high energy usage where efficient energy consumption measures can be implemented. With the last paper, we can trace the root cause for high energy usage.
  
3. Delete Row 4 which contains the overall energy consumption
+
<!-- END RELATED WORKS -->
  
[[File:IS415-DataPrepPriv3.png|700px|center]]
+
<!-- START INSPIRATIONS-->
  
4. Add two columns in columns O and P, give them headers named "X" and "Y"
+
==<div style="margin-top: 10px;font-family: Helvetica; text-align: left;font-size:20px; border: 5px solid #00000000; border-radius:5px; text-align:center; background-color: #132039; color: white; padding: 2px"><span style="font-size:24px;">I</span>nspirations</div>==
 +
[[File:Otf choro.PNG |700px|center]]<br />
 +
The number of public and private address points in Singapore is exceptionally large at about twenty thousands records. While this may pale in comparison to data sets that amount to tens of millions of records in size, the real challenge lies in plotting these points over a geographical region as small as Singapore. The limitation in land space coupled with the immense number of data points would result in many overlapping and cluttering of address points, making data aggregation and visualizing energy consumption extremely difficult and ineffective.
  
5. Geocode the postal codes and put the results into "X" and "Y"
+
Our team has already experimented aggregating energy consumption levels onto a choropleth map segmented by planning areas. This approach is effective in providing an overview of energy consumption levels across planning areas in Singapore, further assisting analysis in local indications of spatial correlation in terms of energy usage clustering. However, this approach is inept at investigating clustering at finer levels of spatial granularity, focusing on smaller areas is impossible as data is aggregated at the level of planning areas.
 +
[[File:Hexbin inspiration.PNG ||center]]<br />
 +
With this in mind, On The Fly is experimenting with an alternative technique of hexagonal binning for visualizing energy usage density of public and private housings. By aggregating the number of address points into hexagons and computing the average energy consumption of address points in these hexagons, we aim to visualize energy consumption levels of address points aggregated across smaller areas in hex bins to generate a more detailed view of energy usages across geographical land space.
 +
 +
<!-- END INSPIRATIONS-->
  
6. Save the file
+
<!-- START PROPOSED STORYBOARD-->
  
<u>b) Preparing raw data from EMA for 2013 Public Housing</u>
+
==<div style="margin-top: 10px;font-family: Helvetica; text-align: left;font-size:20px; border: 5px solid #00000000; border-radius:5px; text-align:center; background-color: #132039; color: white; padding: 2px"><span style="font-size:24px;">P</span>roposed <span style="font-size:24px">S</span>toryboard</div>==
  
1. Open up first half of the public data
+
[[File:1718t1is428T15-Storyboard.png|700px|center]]
  
2. Open up a new excel file
+
=== Upload Data ===
  
[[File:IS415-DataPrepPub1.png|700px|center]]
+
An interface will be provided for the user to upload datasets of past and/or future years. This would provide more flexibility for users to analyze a wider range of energy usage data.
  
3. Copy out each month's data into the the file
+
=== Hexagon Binning ===
  
[[File:IS415-DataPrepPub2.png|700px|center]]
+
[[File:1718t1is428T15-StoryboardHexBin.png|700px|center]]
 +
<center><small><i>Example of hexbin by [https://bl.ocks.org/mbostock/4330486 Mike Bostock]</i></small></center>
  
4. Repeat the steps 1 to 3 for second half of the public data. At the end, there should be 12 sheets in total, in ordered by months from January to December.  
+
There are many levels that we can consider when thinking of how to analyze the intensity of energy consumption in Singapore, such as on a national level, regional level, or subzone level. But these levels are too coarse and does not provide a comprehensive view - for instance, a large subzone would surely have a higher energy consumption level since there are more residents living in it.
  
In each sheet of the new excel file:
+
Thus, to properly analyze the intensity of energy consumption in Singapore, we need to do so on a more granular level; We decide to break down Singapore into various groups of postal codes. By aggregating a few postal codes together, we have a higher chance of uncovering new findings.
  
[[File:IS415-DataPrepPub3.png|700px|center]]
+
The best way to visualize this would be to plot hexagon bins (“hexbins”) onto the Singapore map, with each hexbin representing a group of postal codes, and using a gradient colour scheme to represent each group’s energy consumption intensity.
  
5. Delete Row 4 from each sheet
+
=== Line Chart ===
  
[[File:IS415-DataPrepPub4.png|700px|center]]
+
[[File:1718t1is428T15-StoryboardLineChart.png|700px|center]]
 +
<center><small><i>Example of multi-series line chart by [https://bl.ocks.org/mbostock/3884955 Mike Bostock]</i></small></center>
  
6. Add two columns in columns O and P, give them headers named "X" and "Y"
+
We will add in a multi-series line chart to allow users to compare the monthly energy consumption levels by: 1) entire Singapore, 2) a group of postal codes, and 3) each postal code.
  
7. Geocode the postal codes and put the results into "X" and "Y"
+
The default line chart would show only the average monthly consumption of the entire nation. Upon clicking on a single hexbin, the line chart would populate another series to show the average monthly consumption by the group of postal codes within that hexbin. The clicking would also trigger the pop-out of a second map which features a zoomed in view of the hexbin, displaying the separate postal codes within the hexbin. Clicking on any points on the pop-out map would result in a third series, representing a single postal code, to be displayed on the line chart.
  
8. Save the file as "2013_pub.xlsx"
+
<!-- END  PROPOSED STORYBOARD-->
  
<!-- END UPLOADING -->
 
  
<b>Uploading Files to Enerlyst</b>
+
<!-- START DATA PREPARATION-->
  
Once the data files for 2013 to 2015 are ready, we upload them into <b>Enerlyst</b>. The application reads the file’s name, and recognises the year and property type it represents.
+
==<div style="margin-top: 10px;font-family: Helvetica; text-align: left;font-size:20px; border: 5px solid #00000000; border-radius:5px; text-align:center; background-color: #132039; color: white; padding: 2px"><span style="font-size:24px;">D</span>ata Preparation</div>==
  
For private housing data, the application converts the sheet into a data frame. Whereas for public housing data, the application loops through the 12 sheets (months) of data, aggregating each month’s energy consumption by postal code. In other words, the application finds the total energy consumed by a residential building by totalling consumption of 1-or-2-room, 3-room, 4-room and 5-room/executive apartments. The aggregate is transposed into a data frame, and columns are renamed to show the month. The data frames for private and public housing are similar, and contains the following columns:
+
=== File Upload Format for Application ===
  
<div style="text-align:center">
+
We transformed the raw public and private housing datasets into two excel workbooks, which our app is able to read, with the following columns:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 
  
Postal Code ║ Jan ║ Feb ║ Mar ║ Apr ║ May ║ Jun ║ Jul ║ Aug ║ Sep ║ Oct ║ Nov ║ Dec ║  X  ║  Y 
+
<center>
 +
<i><b>Public Housing</b></i>
  
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+
[[File:1718t1is428T15-PublicCompiled.png|400px|center]]
</div>
+
<small><i>Snapshot of public housing dataset</i></small>
  
After the data frame has been constructed, the application moves on to clean up ‘na’ and ’s’ values, which represent negligible levels of energy consumption and suppressed individual data. These values are replaced with zeroes, and treated as housing with no energy consumption. The application then uses the data frames’ X and Y coordinates to convert the it into a spatial points data frame, and change its reference coordinate system to WGS84.
+
{| class="wikitable"
 +
|-
 +
! Attribute
 +
! Description
 +
|-
 +
| Postal Code
 +
| Postal code of a public residential building
 +
|-
 +
| oneroom
 +
| Average electricity consumed by 1-room flats in the building
 +
|-
 +
| threeroom
 +
| Average electricity consumed by 3-room flats in the building
 +
|-
 +
| fourroom
 +
| Average electricity consumed by 4-room flats in the building
 +
|-
 +
| fiveroom
 +
| Average electricity consumed by 5-room/executive flats in the building
 +
|-
 +
| average
 +
| Average electricity consumed by all flats in the building
 +
|-
 +
| year
 +
| Year in which the amount of electricity was consumed and measured
 +
|-
 +
| month
 +
| Month in which the amount of electricity was consumed and measured
 +
|-
 +
| lat
 +
| Latitude of the building
 +
|-
 +
| long
 +
| Longitude of the building
 +
|-
 +
| address
 +
| Address of the building
 +
|}
 +
</center>
  
To allow users to analyse energy consumption clusters by housing types, <b>Enerlyst</b> then moves on to identify which subzones these residential buildings belongs to, and computes 1) private housing’s average energy consumption by subzone, 2) public housing’s average energy consumption by subzone, and 3) combined average energy consumption by subzone. To perform the computation, the following details need to be concluded from the data frames:
+
<center>
*Total energy consumption of private housing by subzone
+
<i><b>Private Housing</b></i>
*Total energy consumption of public housing by subzone
 
*Total energy consumption of all housing by subzone
 
*Count of private residential building per subzone
 
*Count of public residential building per subzone
 
*Count of all residential building per subzone
 
  
<!-- END ON THE FLY -->
+
[[File:1718t1is428T15-PrivateCompiled.png|400px|center]]
 +
<small><i>Snapshot of private housing dataset</i></small>
  
<!-- START CHOROPLETH -->
+
{| class="wikitable"
 +
|-
 +
! Attribute
 +
! Description
 +
|-
 +
| Postal Code
 +
| Postal code of a private residential building
 +
|-
 +
| average
 +
| Average electricity consumed by all flats in the building
 +
|-
 +
| year
 +
| Year in which the amount of electricity was consumed and measured
 +
|-
 +
| month
 +
| Month in which the amount of electricity was consumed and measured
 +
|-
 +
| lat
 +
| Latitude of the building
 +
|-
 +
| long
 +
| Longitude of the building
 +
|-
 +
| address
 +
| Address of the building
 +
|}
 +
</center>
  
===<div font-family: Century Gothic; padding: 0px 30px 0px 18px;"><span style="font-size:20px;">Choropleth Map</span></div>===
+
=== Measuring Average of Each Public Postal Code ===
[[File:IS415-OnTheFly-ChoroSS.png|700px|center]]
 
  
Using monthly raw data on residential energy consumption from EMA, Enerlyst aggregates the energy consumption by subzone and then find the average consumption per apartment block in each subzone.  
+
In our proposed storyboard, the color intensity of each hexbin represents the amount of energy consumed by the postal codes within the hexbin. As we moved further into the project, however, we faced a limitation from EMA's datasets.
  
Enerlyst provides an overview of each subzone's average energy consumption using three different classification techniques:
+
For public housing, the data provided by EMA only tells us the average electricity consumption by all apartments that falls under the same dwelling type. For instance, the dataset for July 2015 would tell us that 3-room flats in postal code 824601 used an average of 339 kwh of electricity. The crucial information that we were unable to obtain is how many 3-room flats are in postal code 824601, or what is the total electricity consumed by all the 3-room flats. This makes it impossible for us to accurately compute each postal code's weighted-average electricity consumption.
  
*Natural break Jenks
+
The only measure we can use to determine the public housings' hexbin color intensity is thus the average of averages. That is, for each postal code, we take the average energy consumption of 1-room/2-room, 3-room, 4-room and 5-room/executive apartments and treat that as the postal code average. However, we are aware that this is a very inaccurate representation of the actual postal code average, unless the postal code has an equally distribute number of dwelling types<ref>https://math.stackexchange.com/questions/95909/why-is-an-average-of-an-average-usually-incorrect</ref>.
*Equal Interval
 
*Quantile
 
  
Users are able to select different classifications, colors and number of classes using the selecting panel on the left. Changes will be updated dynamically once the user has finalised the selection.
+
=== Suppressed Data ===
<!-- END CHOROPLETH -->
 
  
 +
Datasets for both public and private housing contains many 's' values, which represent readings that are suppressed to avoid disclosure of individual data. Such values, whether left in the datasets or removed, will affect the accuracy of our analysis. If left in the datasets, then our computation of a postal code's average would treat the 's' values as 0 (since there is no reliable way to estimate the 's' values), and bring down the overall average. On the other hand, if postal codes containing 's' values are removed from our datasets, months' or years' worth of data would be missing. At the end, we decided to remove such postal codes as it is better to treat them as missing values rather than 0.
  
<!-- START LOCAL MORAN I -->
+
[[File:1718t1is428T15-Suppressed1.png|400px|center]]
 +
[[File:1718t1is428T15-Suppressed2.png|400px|center]]
 +
<center><small><i>Snapshot of suppressed values in dataset</i></small></center>
  
===<div font-family: Century Gothic; padding: 0px 30px 0px 18px;"><span style="font-size:20px;">Local Moran's I</span></div>===
+
=== Geocoding Postal Codes ===
[[File:IS415-OnTheFly-LocalMISS.png|700px|center]]
 
Enerlyst provides local auto correlation analysis where hot and cold clusters are identified interms of residential energy consumption. The Local Moran's I's statistic of spatial association for each subzone is given as:
 
  
[[File:LocalMoranIFormula.png.png|400px|center]]
+
To be able to plot the postal codes on a map, we first need to convert the postal codes to longitude and latitude. We did this by creating a geocoder app ([https://github.com/tankunsheng/SgPostalToLatLng LeGeocoder]) that calls upon OneMap's <code>search</code> API. The geocoder is able to read an excel workbook containing the postal codes, and returns an updated excel workbook containing the longitude and latitude for each postal code.  
  
Where (''x<sub>i</sub>'' - ''X-bar'') is the deviation of subzone's energy consumption with respect ot he mean of its neighbours, and ''w<sub>ij</sub>'' is the spatial weight between two subzones, and
+
To make the search more efficient, we first created a separate spreadsheet that compiles all the unique public and private postal codes across 2013-2015. Then with the results from the geocoder, we performed <code>vlookup</code> with excel to obtain the lat long for the file to be uploaded onto our app.
  
[[File:LocalMoranIFormula2.png.png|400px|center]]
+
We found that for certain postal codes, OneMap's API would return us the values for bus stops, rather than residential buildings.  
  
with ''n'' being the number of subzones in Singapore. Each subzone's neighbour is defined as neighbouring subzones with which it shares a border.
+
[[File:1718t1is428T15-OneMapBusStops.png|400px|center]]
 +
<center><small><i>Snapshot of OneMap API returning bus stops instead of residential buildings</i></small></center>
  
There is also a scatterplot between X and the "spatial lag" of X, formed by averaging all values of X for the neighboring polygons, where X is a subzone's average apartment block energy consumption. The plot identifies which type of spatial autocorrelation exists.
+
And for some postal codes, although a residential building can be found through a Google search, the API was unable to return any results.
[[File:IS415-OnTheFly-Scatterplot.png|700px|center]]
 
<!-- END  LOCAL MORAN I -->
 
  
<!-- START LISA -->
+
[[File:1718t1is428T15-OneMapCantFind.png|400px|center]]
 +
<center><small><i>Snapshot of OneMap API being unable to find valid postal codes</i></small></center>
  
===<div font-family: Century Gothic; padding: 0px 30px 0px 18px;"><span style="font-size:20px;">LISA</span></div>===
+
<!-- END DATA PREPARATION-->
[[File:IS415-OnTheFly-LisaSS.png|700px|center]]
 
Extending from Local Moran's I, Enerlyst uses LISA to show each subzone's statistically significant relationship with its neighbors, and show the type of relationship. The quadrants in the plot can be interpreted in the following manner:
 
  
*Top-left quadrant = low-high cluster
 
*Top-right quadrant = high-high cluster
 
*Bottom-left quadrant = low-low cluster
 
*Bottom-right quadrant = high-low cluster
 
<!-- END  LISA -->
 
  
<!-- END APPLICATION FEATURES-->
+
<!-- START ARCHITECTURE DIAGRAM-->
  
<!-- START CASE STUDY ANALYSIS-->
+
==<div style="margin-top: 10px;font-family: Helvetica; text-align: left;font-size:20px; border: 5px solid #00000000; border-radius:5px; text-align:center; background-color: #132039; color: white; padding: 2px"><span style="font-size:24px;">A</span>rchitecture <span style="font-size:24px">D</span>iagram</div>==
  
==<div style="margin-top: 10px;font-family: Helvetica; text-align: left;font-size:20px; border: 5px solid #00000000; border-radius:5px; text-align:center; background-color: #708090; color: white; padding: 2px"><span style="font-size:24px;">C</span>ase <span style="font-size:24px">S</span>tudy <span style="font-size:24px">A</span>nalysis</div>==
+
[[File:1718t1is428T15-MainArchitecture.png|400px|center]]
EMA publishes energy statistics on an annual basis to provide readers with a comprehensive understanding of the Singapore energy landscape through a detailed coverage of various energy-related topics. As project Enerlyst focuses on analysing households' energy consumption, only private and public households data will be used. This study will be based on EMA dataset from 2013 to 2015.  2013 data will be prepared manually whereas 2014 and 2015 data will be uploaded to the application and process on the fly.
+
<center><small><i>Architecture diagram of Visual Enerlyst application</i></small></center>
 
<!-- END CASE STUDY ANALYSIS-->
 
  
===<div font-family: Century Gothic; padding: 0px 30px 0px 18px;"><span style="font-size:18px;">Choropleth Map</span></div>===
 
<div align = center><strong>Private Housing</strong></div><br />
 
[[File:IS415-Group2-OnTheFly-choroSSPrivate.png|700px|center]]<br />
 
Higher energy consumption can found in the central region.
 
[[File:IS415-Group2-OnTheFly-choroSSPrivate2.png|700px|center]]<br />
 
Sungei Road sub zone has the highest average energy consumption of approximately 2163 kWh.
 
<div align = center><strong>Public Housing</strong></div><br />
 
[[File:IS415-Group2-OnTheFly-choroSSPublic.png|700px|center]]<br />
 
North-east region has a cluster of sub zones which has a higher energy consumption.
 
[[File:IS415-Group2-OnTheFly-choroSSPublic2.png|700px|center]]<br />
 
Lower Seletar subzone has the highest average energy consumption of approximately 1024 kWh.
 
  
Choropleth maps may seem to be a decent indicator of spatial clustering at a glance. When spatial polygons are of the same color as its neighboring polygons, it may appear to signify a clustering of features based around the attribute of interest. This however, is misleading as the choice of classification method and number of classes specified can result in very different looking choropleth maps. The map creater gets to paint the picture by controlling the variables and thus, the objectivity of the analysis is questionable at best.
+
[[File:1718t1is428T15-GeocoderArchitecture.png|400px|center]]
<div align = center><strong>Jenks Natural Breaks</strong></div><br />
+
<center><small><i>Architecture diagram of LeGeocoder application</i></small></center>
[[File:IS415-Group2-OnTheFly-choroSS1.png|700px|center]]<br />
 
<div align = center><strong>Equal Interval</strong></div><br />
 
[[File:IS415-Group2-OnTheFly-choroSS2.png|700px|center]]<br />
 
  
For instance, if we were to look at the choropleth map for energy consumption for the 4 months (March, June, September and December) of 2013, a classification using Jenks Natural Breaks would show that in the central region, in Paterson and Dunearn subzones particularly, they belongs to the grouping of highest energy consumptions visually.  However,  using a classification of Equal Interval, Paterson and Dunearn are no longer in the grouping of highest energy consumptions visually. Hence,  a choropleth map could be misleading despite the attractiveness of the data representation. An analysis such as spatial autocorrelation could be used to provide concrete evidences to spatial clustering.
+
<!-- END ARCHITECTURE DIAGRAM-->
  
===<div font-family: Century Gothic; padding: 0px 30px 0px 18px;"><span style="font-size:18px;">Local Moran's I</span></div>===
+
<!-- START TECHNICAL CHALLENGES-->
<div align = center><strong>Private Housing</strong></div><br />
 
[[File:IS415-Group2-OnTheFly-LMIPrivate.png|700px|center]]<br />
 
There is a clustering of subzones in the west, central and east region which share the similarity of almost equivalent energy consumption.
 
<div align = center><strong>Public Housing</strong></div><br />
 
[[File:IS415-Group2-OnTheFly-LMIPublic.png|700px|center]]<br />
 
The clustering of subzones which share the similarity of almost equivalent energy consumption are in the, west, north-east and east region.
 
[[File:IS415-Group2-OnTheFly-LMISP.png|700px|center]]<br />
 
Together with the Local Moran's I, a Moran scatterplot is available to complement the Local Moran's I. It provides an easy way to categorize the nature of spatial autocorrelation into the four classifications which are mainly high-high, high-low, low-low, and low-high. The scatterplot compares the value of the selected variable (x- axis) with its own spatial lagged value (y-axis). This lagged value is derived from the average of the value of the same variable from its neighbors.
 
  
===<div font-family: Century Gothic; padding: 0px 30px 0px 18px;"><span style="font-size:18px;">LISA</span></div> ===
+
==<div style="margin-top: 10px;font-family: Helvetica; text-align: left;font-size:20px; border: 5px solid #00000000; border-radius:5px; text-align:center; background-color: #132039; color: white; padding: 2px"><span style="font-size:24px;">T</span>echnical <span style="font-size:24px">C</span>hallenges</div>==
<div align = center><strong>Private Housing</strong></div><br />
 
[[File:IS415-Group2-OnTheFly-LISAPrivate.png|700px|center]]<br />
 
the LISA for the private housing dataset in December 2015 shows that in the west region, Saujana, Jelabu, Dairy Farm and Bangkit subzone have a significant higher energy consumption when compared to the mean of the energy consumption of private housing and the neighbouring subzones are highly similar. For the east region, Bayshore subzone is identified as the higher energy consumption and its neighbouring sub zones such as Siglap shares similar traits.
 
<div align = center><strong>Public Housing</strong></div><br />
 
[[File:IS415-Group2-OnTheFly-LISAPublic.png|700px|center]]<br />
 
LISA has proven that the Local Moran's I is accurate as Keat Hong and Hougang East subzone share a higher electricity consumption with its neighbouring subzones.
 
  
 
+
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
With such information, energy saving solution can be implemented on the identified subzones to further reduce energy consumption.
 
 
 
<!-- START TIMELINE -->
 
 
 
==<div style="margin-top: 10px;font-family: Helvetica; text-align: left;font-size:20px; border: 5px solid #00000000; border-radius:5px; text-align:center; background-color: #708090; color: white; padding: 2px"><span style="font-size:24px;">T</span>imeline</div>==
 
<div align="center">
 
{| class="wikitable"
 
 
|-
 
|-
! Week No(s). !! Task !! Status
+
! style="font-weight: bold;background: #181c48;color:#fbfcfd;width: 50%;" | Key Technical Challenges
 +
! style="font-weight: bold;background: #181c48;color:#fbfcfd;" | How We Propose To Resolve
 
|-
 
|-
| 3 || Form team || Completed
+
| <center> Unfamiliarity with d3.js libraries </center> ||  
 +
* Independent Learning
 +
* Consult Instructor Prakash
 +
* Peer Learning
 
|-
 
|-
| 4-5 || Discuss and choose a project topic || Completed
+
| <center> Data Cleaning & Transformation </center> ||  
 +
* Work together to clean, transform and analyze the data
 
|-
 
|-
| 6-9 || Research chosen project topic and data collection || Completed
+
| <center> Unfamiliarity in Implementing Interactivity and Animation Tools/Techniques in Visualization App </center> ||  
|-
+
* Develop a Storyboard/Design Flow
| 10-11 || Create project repository and web application planning ||Completed
+
* Assign members to specialize on Interactivity/Animation Techniques
|-
 
| 12 || Create project wiki || Completed
 
|-
 
| 11-15 || Develop application || Completed
 
|-
 
| 14-15 || Research report || Completed
 
|-
 
| 16 || Finalize application || Completed
 
|-
 
| 16 || Finalize Poster and Research Paper || Completed
 
|-
 
| 16 || Prepare for Townhall Presentation || Completed
 
|-
 
| 16 || Townhall Poster Presentation || Completed
 
|-
 
| 16 || Final Project Submission ||  Completed
 
 
|}
 
|}
</div>
 
<!-- END TIMELINE -->
 
  
<!-- START FUTURE WORK -->
+
<!-- END TECHNICAL CHALLENGES-->
 +
 
 +
<!-- START TIMELINE -->
 +
 
 +
==<div style="margin-top: 10px;font-family: Helvetica; text-align: left;font-size:20px; border: 5px solid #00000000; border-radius:5px; text-align:center; background-color: #132039; color: white; padding: 2px"><span style="font-size:24px;">T</span>imeline</div>==
  
==<div style="margin-top: 10px;font-family: Helvetica; text-align: left;font-size:20px; border: 5px solid #00000000; border-radius:5px; text-align:center; background-color: #708090; color: white; padding: 2px"><span style="font-size:24px;">F</span>uture <span style="font-size:24px">W</span>ork</div>==
+
[[File:1718t1is428T15-Timeline.png|700px|center]]
  
*Allowing analyst to upload industrial energy usage data
+
<!-- END  TIMELINE-->
*Performing cluster analysis using point data
+
 
*Including Geary C analysis on top of Local Moran's I
+
<!-- START TECHNICAL TECHNOLOGIES-->
<!-- END FUTURE WORK -->
+
 
 +
==<div style="margin-top: 10px;font-family: Helvetica; text-align: left;font-size:20px; border: 5px solid #00000000; border-radius:5px; text-align:center; background-color: #132039; color: white; padding: 2px"><span style="font-size:24px;">T</span>echnologies/<span style="font-size:24px">T</span>ools</div>==
 +
 
 +
The following are technologies and tools which we used:
 +
* Microsoft Excel (data cleaning)
 +
* d3.js (visualisation)
 +
* Leaflet.js (overlaying of map)
 +
* d3-hexbin.js (overlaying of hexagonal bins, wrapper library of d3 and leaflet)
 +
* Github (version control)
 +
 
 +
<!-- END TECHNOLOGIES-->
  
 
<!-- START REFERENCE -->
 
<!-- START REFERENCE -->
  
==<div style="margin-top: 10px;font-family: Helvetica; text-align: left;font-size:20px; border: 5px solid #00000000; border-radius:5px; text-align:center; background-color: #708090; color: white; padding: 2px"><span style="font-size:24px;">R</span>eference</div>==
+
==<div style="margin-top: 10px;font-family: Helvetica; text-align: left;font-size:20px; border: 5px solid #00000000; border-radius:5px; text-align:center; background-color: #132039; color: white; padding: 2px"><span style="font-size:24px;">R</span>eference</div>==
 
<references />
 
<references />
 
<!-- END REFERENCE -->
 
<!-- END REFERENCE -->
 +
 +
<!-- START COMMENTS -->
 +
 +
==<div style="margin-top: 10px;font-family: Helvetica; text-align: left;font-size:20px; border: 5px solid #00000000; border-radius:5px; text-align:center; background-color: #132039; color: white; padding: 2px"><span style="font-size:24px;">C</span>omments</div>==
 +
<references />
 +
<!-- END COMMENTS -->

Latest revision as of 11:29, 26 November 2017

OnTheFlyLogo.png


PROJECT PROPOSAL

PROJECT POSTER

PROJECT APPLICATION

RESEARCH PAPER


Project Motivation & Objective

Experts have warned that power demand is set to double by 2030 globally despite authoritative control. High power consumption can already be observed locally. According to the Energy Market Authority (EMA), Singapore has faced increasing power consumption from 1965 to 2013 [1].

1718t1is428T15-Motivation.png

As Singapore is land-scarce and does not have significant renewable energy options such as hydro-power, wave, or sufficient land for mass solar energy production, energy has been a top concern in the urban nation[2]. It is thus important to promote energy saving concepts to the public as well as deploying energy saving solution island wide. However, the usual analysis tools are not enough to provide a different perspective to facilitate the deployment of the solution. Information about the energy consumption levels of residents in Singapore are often not conveyed adequately enough in data visualisation. While EMA and Singstat provide annual data and reports on energy usage in Singapore, a powerful visualisation technique should be used to gain insights effectively. Our team aims to create a visualisation that leverages on energy datasets provided by EMA to perform spatial analysis to identify energy usage clusters with hexagonal binning.


Datasets

Data Source

Our analysis will be based on EMA's collection of data on Singapore's residential electricity consumption[3]:

  • Public housing's monthly average household electricity consumption (kwh) (2013 - 2015)
  • Private apartment's monthly average household electricity consumption (kwh) (2013 - 2015)

Data Attributes

Public Housing

The dataset for each year is split into two excel workbooks, each containing six sheets representing each month's data as shown below:

1718t1is428T15-DataSourcePublic.png

The following is a snapshot of Jan 2015's electricity consumption data, and a description of the data attributes collected for each month:

1718t1is428T15-DataSourcePublicJan.png
Attribute Description
Postal Code Postal code of a public residential building
1-room / 2-room Average electricity consumed by 1-room/2-room flats in the building
3-room Average electricity consumed by 3-room flats in the building
4-room Average electricity consumed by 4-room flats in the building
5-room / Executive Average electricity consumed by 5-room/executive flats in the building

Private Housing

The datasets for multiple years are split into one excel workbook, each sheet representing each year's data as shown below:

1718t1is428T15-DataSourcePrivate.png

Each year's data contains the following attributes:

Attribute Description
Postal Code Postal code of a private residential building
Jan Average electricity consumed by all apartments in the building in Jan
Feb Average electricity consumed by all apartments in the building in Feb
Mar Average electricity consumed by all apartments in the building in Mar
Apr Average electricity consumed by all apartments in the building in Apr
May Average electricity consumed by all apartments in the building in May
Jun Average electricity consumed by all apartments in the building in Jun
Jul Average electricity consumed by all apartments in the building in Jul
Aug Average electricity consumed by all apartments in the building in Aug
Sep Average electricity consumed by all apartments in the building in Sep
Oct Average electricity consumed by all apartments in the building in Oct
Nov Average electricity consumed by all apartments in the building in Nov
Dec Average electricity consumed by all apartments in the building in Dec


Related Works

Much of the relevant prior work on residential energy consumption levels in Singapore revolve around the motivations and barriers towards energy efficiency.

In 2013, the Ministry of the Environment and Water Resources (MEWR) interviewed 2,500 residents on their extent of energy efficiency practice at home, level of awareness of energy efficiency, and barriers towards being energy efficient. It found that 41.3% of the respondents are more encouraged to conserve electricity if the government were to provide monetary incentives or voucher rewards/rebates, and 36.5% are motivated by advertisements on various media platforms. The findings also concluded that residents generally perceived the high cost of energy-efficient appliances and inconvenience of energy-saving practices as barriers to energy efficiency in households.[4]

Another research by Energy Efficient Singapore (E2 Singapore) indicated that when residents in other countries are allowed to compare their utility bills against that of their neighbours, they can potentially achieve 4 to 12% energy savings. This is because it leverages on the power of social norms to provide direct feedback to the residents – residents are likely to bring their behavior closer to the norm when they are informed of what the norm is.[5]

A third by Xu and Ang from NUS analyzes the root cause of high energy consumption using the index decomposition analysis (IDA). The IDA model studies changes in energy consumption over time and is often used in major energy consuming sectors such as the transport industry. To fit the model for use on the residential sector, Xu and Ang applied a hybrid IDA model that divides the residential sector into various subsectors, each with a different key factor driving energy consumption. For instance, energy consumption in a subsector may be driven by floor area (for air cooling and heating). The paper found that environment control and household appliances are the main factors for energy consumption by households, and each of these is greatly affected by population growth and decreases in residents per household.[6]

By using our proposed work jointly with the first two papers, users can visually identify clusters with high energy usage where efficient energy consumption measures can be implemented. With the last paper, we can trace the root cause for high energy usage.


Inspirations

Otf choro.PNG


The number of public and private address points in Singapore is exceptionally large at about twenty thousands records. While this may pale in comparison to data sets that amount to tens of millions of records in size, the real challenge lies in plotting these points over a geographical region as small as Singapore. The limitation in land space coupled with the immense number of data points would result in many overlapping and cluttering of address points, making data aggregation and visualizing energy consumption extremely difficult and ineffective.

Our team has already experimented aggregating energy consumption levels onto a choropleth map segmented by planning areas. This approach is effective in providing an overview of energy consumption levels across planning areas in Singapore, further assisting analysis in local indications of spatial correlation in terms of energy usage clustering. However, this approach is inept at investigating clustering at finer levels of spatial granularity, focusing on smaller areas is impossible as data is aggregated at the level of planning areas.

Hexbin inspiration.PNG


With this in mind, On The Fly is experimenting with an alternative technique of hexagonal binning for visualizing energy usage density of public and private housings. By aggregating the number of address points into hexagons and computing the average energy consumption of address points in these hexagons, we aim to visualize energy consumption levels of address points aggregated across smaller areas in hex bins to generate a more detailed view of energy usages across geographical land space.


Proposed Storyboard

1718t1is428T15-Storyboard.png

Upload Data

An interface will be provided for the user to upload datasets of past and/or future years. This would provide more flexibility for users to analyze a wider range of energy usage data.

Hexagon Binning

1718t1is428T15-StoryboardHexBin.png
Example of hexbin by Mike Bostock

There are many levels that we can consider when thinking of how to analyze the intensity of energy consumption in Singapore, such as on a national level, regional level, or subzone level. But these levels are too coarse and does not provide a comprehensive view - for instance, a large subzone would surely have a higher energy consumption level since there are more residents living in it.

Thus, to properly analyze the intensity of energy consumption in Singapore, we need to do so on a more granular level; We decide to break down Singapore into various groups of postal codes. By aggregating a few postal codes together, we have a higher chance of uncovering new findings.

The best way to visualize this would be to plot hexagon bins (“hexbins”) onto the Singapore map, with each hexbin representing a group of postal codes, and using a gradient colour scheme to represent each group’s energy consumption intensity.

Line Chart

1718t1is428T15-StoryboardLineChart.png
Example of multi-series line chart by Mike Bostock

We will add in a multi-series line chart to allow users to compare the monthly energy consumption levels by: 1) entire Singapore, 2) a group of postal codes, and 3) each postal code.

The default line chart would show only the average monthly consumption of the entire nation. Upon clicking on a single hexbin, the line chart would populate another series to show the average monthly consumption by the group of postal codes within that hexbin. The clicking would also trigger the pop-out of a second map which features a zoomed in view of the hexbin, displaying the separate postal codes within the hexbin. Clicking on any points on the pop-out map would result in a third series, representing a single postal code, to be displayed on the line chart.



Data Preparation

File Upload Format for Application

We transformed the raw public and private housing datasets into two excel workbooks, which our app is able to read, with the following columns:

Public Housing

1718t1is428T15-PublicCompiled.png

Snapshot of public housing dataset

Attribute Description
Postal Code Postal code of a public residential building
oneroom Average electricity consumed by 1-room flats in the building
threeroom Average electricity consumed by 3-room flats in the building
fourroom Average electricity consumed by 4-room flats in the building
fiveroom Average electricity consumed by 5-room/executive flats in the building
average Average electricity consumed by all flats in the building
year Year in which the amount of electricity was consumed and measured
month Month in which the amount of electricity was consumed and measured
lat Latitude of the building
long Longitude of the building
address Address of the building

Private Housing

1718t1is428T15-PrivateCompiled.png

Snapshot of private housing dataset

Attribute Description
Postal Code Postal code of a private residential building
average Average electricity consumed by all flats in the building
year Year in which the amount of electricity was consumed and measured
month Month in which the amount of electricity was consumed and measured
lat Latitude of the building
long Longitude of the building
address Address of the building

Measuring Average of Each Public Postal Code

In our proposed storyboard, the color intensity of each hexbin represents the amount of energy consumed by the postal codes within the hexbin. As we moved further into the project, however, we faced a limitation from EMA's datasets.

For public housing, the data provided by EMA only tells us the average electricity consumption by all apartments that falls under the same dwelling type. For instance, the dataset for July 2015 would tell us that 3-room flats in postal code 824601 used an average of 339 kwh of electricity. The crucial information that we were unable to obtain is how many 3-room flats are in postal code 824601, or what is the total electricity consumed by all the 3-room flats. This makes it impossible for us to accurately compute each postal code's weighted-average electricity consumption.

The only measure we can use to determine the public housings' hexbin color intensity is thus the average of averages. That is, for each postal code, we take the average energy consumption of 1-room/2-room, 3-room, 4-room and 5-room/executive apartments and treat that as the postal code average. However, we are aware that this is a very inaccurate representation of the actual postal code average, unless the postal code has an equally distribute number of dwelling types[7].

Suppressed Data

Datasets for both public and private housing contains many 's' values, which represent readings that are suppressed to avoid disclosure of individual data. Such values, whether left in the datasets or removed, will affect the accuracy of our analysis. If left in the datasets, then our computation of a postal code's average would treat the 's' values as 0 (since there is no reliable way to estimate the 's' values), and bring down the overall average. On the other hand, if postal codes containing 's' values are removed from our datasets, months' or years' worth of data would be missing. At the end, we decided to remove such postal codes as it is better to treat them as missing values rather than 0.

1718t1is428T15-Suppressed1.png
1718t1is428T15-Suppressed2.png
Snapshot of suppressed values in dataset

Geocoding Postal Codes

To be able to plot the postal codes on a map, we first need to convert the postal codes to longitude and latitude. We did this by creating a geocoder app (LeGeocoder) that calls upon OneMap's search API. The geocoder is able to read an excel workbook containing the postal codes, and returns an updated excel workbook containing the longitude and latitude for each postal code.

To make the search more efficient, we first created a separate spreadsheet that compiles all the unique public and private postal codes across 2013-2015. Then with the results from the geocoder, we performed vlookup with excel to obtain the lat long for the file to be uploaded onto our app.

We found that for certain postal codes, OneMap's API would return us the values for bus stops, rather than residential buildings.

1718t1is428T15-OneMapBusStops.png
Snapshot of OneMap API returning bus stops instead of residential buildings

And for some postal codes, although a residential building can be found through a Google search, the API was unable to return any results.

1718t1is428T15-OneMapCantFind.png
Snapshot of OneMap API being unable to find valid postal codes



Architecture Diagram

1718t1is428T15-MainArchitecture.png
Architecture diagram of Visual Enerlyst application


1718t1is428T15-GeocoderArchitecture.png
Architecture diagram of LeGeocoder application


Technical Challenges

Key Technical Challenges How We Propose To Resolve
Unfamiliarity with d3.js libraries
  • Independent Learning
  • Consult Instructor Prakash
  • Peer Learning
Data Cleaning & Transformation
  • Work together to clean, transform and analyze the data
Unfamiliarity in Implementing Interactivity and Animation Tools/Techniques in Visualization App
  • Develop a Storyboard/Design Flow
  • Assign members to specialize on Interactivity/Animation Techniques


Timeline

1718t1is428T15-Timeline.png


Technologies/Tools

The following are technologies and tools which we used:

  • Microsoft Excel (data cleaning)
  • d3.js (visualisation)
  • Leaflet.js (overlaying of map)
  • d3-hexbin.js (overlaying of hexagonal bins, wrapper library of d3 and leaflet)
  • Github (version control)


Reference


Comments