1718t1is428T15

From Visual Analytics for Business Intelligence
Revision as of 16:19, 23 November 2017 by Meiying.wan.2014 (talk | contribs)
Jump to navigation Jump to search
OnTheFlyLogo.png


PROJECT PROPOSAL

PROJECT POSTER

PROJECT APPLICATION

RESEARCH PAPER


Project Motivation & Objective

Experts have warned that power demand is set to double by 2030 globally despite authoritative control. High power consumption can already be observed locally. According to the Energy Market Authority (EMA), Singapore has faced increasing power consumption from 1965 to 2013 [1].

1718t1is428T15-Motivation.png

As Singapore is land-scarce and does not have significant renewable energy options such as hydro-power, wave, or sufficient land for mass solar energy production, energy has been a top concern in the urban nation[2]. It is thus important to promote energy saving concepts to the public as well as deploying energy saving solution island wide. However, the usual analysis tools are not enough to provide a different perspective to facilitate the deployment of the solution. Information about the energy consumption levels of residents in Singapore are often not conveyed adequately enough in data visualisation. While EMA and Singstat provide annual data and reports on energy usage in Singapore, a powerful visualisation technique should be used to gain insights effectively. Our team aims to create a visualisation that leverages on energy datasets provided by EMA to perform spatial analysis to identify energy usage clusters with hexagonal binning.


Datasets

Data Source

Our analysis will be based on EMA's collection of data on Singapore's residential electricity consumption[3]:

  • Public housing's monthly average household electricity consumption (kwh) (2013 - 2015)
  • Private apartment's monthly average household electricity consumption (kwh) (2013 - 2015)

Data Attributes

Public Housing

The dataset for each year is split into two excel workbooks, each containing six sheets representing each month's data as shown below:

1718t1is428T15-DataSourcePublic.png

The following is a snapshot of Jan 2015's electricity consumption data, and a description of the data attributes collected for each month:

1718t1is428T15-DataSourcePublicJan.png
Attribute Description
Postal Code Postal code of a public residential building
1-room / 2-room Average electricity consumed by 1-room/2-room flats in the building
3-room Average electricity consumed by 3-room flats in the building
4-room Average electricity consumed by 4-room flats in the building
5-room / Executive Average electricity consumed by 5-room/executive flats in the building

Private Housing

The datasets for multiple years are split into one excel workbook, each sheet representing each year's data as shown below:

1718t1is428T15-DataSourcePrivate.png

Each year's data contains the following attributes:

Attribute Description
Postal Code Postal code of a private residential building
Jan Average electricity consumed by all apartments in the building in Jan
Feb Average electricity consumed by all apartments in the building in Feb
Mar Average electricity consumed by all apartments in the building in Mar
Apr Average electricity consumed by all apartments in the building in Apr
May Average electricity consumed by all apartments in the building in May
Jun Average electricity consumed by all apartments in the building in Jun
Jul Average electricity consumed by all apartments in the building in Jul
Aug Average electricity consumed by all apartments in the building in Aug
Sep Average electricity consumed by all apartments in the building in Sep
Oct Average electricity consumed by all apartments in the building in Oct
Nov Average electricity consumed by all apartments in the building in Nov
Dec Average electricity consumed by all apartments in the building in Dec


Related Works

Much of the relevant prior work on residential energy consumption levels in Singapore revolve around the motivations and barriers towards energy efficiency.

In 2013, the Ministry of the Environment and Water Resources (MEWR) interviewed 2,500 residents on their extent of energy efficiency practice at home, level of awareness of energy efficiency, and barriers towards being energy efficient. It found that 41.3% of the respondents are more encouraged to conserve electricity if the government were to provide monetary incentives or voucher rewards/rebates, and 36.5% are motivated by advertisements on various media platforms. The findings also concluded that residents generally perceived the high cost of energy-efficient appliances and inconvenience of energy-saving practices as barriers to energy efficiency in households.[4]

Another research by Energy Efficient Singapore (E2 Singapore) indicated that when residents in other countries are allowed to compare their utility bills against that of their neighbours, they can potentially achieve 4 to 12% energy savings. This is because it leverages on the power of social norms to provide direct feedback to the residents – residents are likely to bring their behavior closer to the norm when they are informed of what the norm is.[5]

A third by Xu and Ang from NUS analyzes the root cause of high energy consumption using the index decomposition analysis (IDA). The IDA model studies changes in energy consumption over time and is often used in major energy consuming sectors such as the transport industry. To fit the model for use on the residential sector, Xu and Ang applied a hybrid IDA model that divides the residential sector into various subsectors, each with a different key factor driving energy consumption. For instance, energy consumption in a subsector may be driven by floor area (for air cooling and heating). The paper found that environment control and household appliances are the main factors for energy consumption by households, and each of these is greatly affected by population growth and decreases in residents per household.[6]

By using our proposed work jointly with the first two papers, users can visually identify clusters with high energy usage where efficient energy consumption measures can be implemented. With the last paper, we can trace the root cause for high energy usage.


Inspirations

Otf choro.PNG


The number of public and private address points in Singapore is exceptionally large at about twenty thousands records. While this may pale in comparison to data sets that amount to tens of millions of records in size, the real challenge lies in plotting these points over a geographical region as small as Singapore. The limitation in land space coupled with the immense number of data points would result in many overlapping and cluttering of address points, making data aggregation and visualizing energy consumption extremely difficult and ineffective.

Our team has already experimented aggregating energy consumption levels onto a choropleth map segmented by planning areas. This approach is effective in providing an overview of energy consumption levels across planning areas in Singapore, further assisting analysis in local indications of spatial correlation in terms of energy usage clustering. However, this approach is inept at investigating clustering at finer levels of spatial granularity, focusing on smaller areas is impossible as data is aggregated at the level of planning areas.

Hexbin inspiration.PNG


With this in mind, On The Fly is experimenting with an alternative technique of hexagonal binning for visualizing energy usage density of public and private housings. By aggregating the number of address points into hexagons and computing the average energy consumption of address points in these hexagons, we aim to visualize energy consumption levels of address points aggregated across smaller areas in hex bins to generate a more detailed view of energy usages across geographical land space.


Proposed Storyboard

1718t1is428T15-Storyboard.png

Upload Data

An interface will be provided for the user to upload datasets of past and/or future years. This would provide more flexibility for users to analyze a wider range of energy usage data.

Hexagon Binning

1718t1is428T15-StoryboardHexBin.png
Example of hexbin by Mike Bostock

There are many levels that we can consider when thinking of how to analyze the intensity of energy consumption in Singapore, such as on a national level, regional level, or subzone level. But these levels are too coarse and does not provide a comprehensive view - for instance, a large subzone would surely have a higher energy consumption level since there are more residents living in it.

Thus, to properly analyze the intensity of energy consumption in Singapore, we need to do so on a more granular level; We decide to break down Singapore into various groups of postal codes. By aggregating a few postal codes together, we have a higher chance of uncovering new findings.

The best way to visualize this would be to plot hexagon bins (“hexbins”) onto the Singapore map, with each hexbin representing a group of postal codes, and using a gradient colour scheme to represent each group’s energy consumption intensity.

Line Chart

1718t1is428T15-StoryboardLineChart.png
Example of multi-series line chart by Mike Bostock

We will add in a multi-series line chart to allow users to compare the monthly energy consumption levels by: 1) entire Singapore, 2) a group of postal codes, and 3) each postal code.

The default line chart would show only the average monthly consumption of the entire nation. Upon clicking on a single hexbin, the line chart would populate another series to show the average monthly consumption by the group of postal codes within that hexbin. The clicking would also trigger the pop-out of a second map which features a zoomed in view of the hexbin, displaying the separate postal codes within the hexbin. Clicking on any points on the pop-out map would result in a third series, representing a single postal code, to be displayed on the line chart.



Data Preparation

Data Preparation

File Upload Format for Application

We transformed the raw public and private housing datasets into two excel workbooks, which our app is able to read, with the following columns:

Public Housing

1718t1is428T15-PublicCompiled.png
Attribute Description
Postal Code Postal code of a public residential building
oneroom Average electricity consumed by 1-room flats in the building
threeroom Average electricity consumed by 3-room flats in the building
fourroom Average electricity consumed by 4-room flats in the building
fiveroom Average electricity consumed by 5-room/executive flats in the building
average Average electricity consumed by all flats in the building
year Year in which the amount of electricity was consumed and measured
month Month in which the amount of electricity was consumed and measured
lat Latitude of the building
long Longitude of the building
address Address of the building

Private Housing

1718t1is428T15-PrivateCompiled.png
Attribute Description
Postal Code Postal code of a private residential building
average Average electricity consumed by all flats in the building
year Year in which the amount of electricity was consumed and measured
month Month in which the amount of electricity was consumed and measured
lat Latitude of the building
long Longitude of the building
address Address of the building

Measuring Average of Each Public Postal Code

In our proposed storyboard, the color intensity of each hexbin represents the amount of energy consumed by the postal codes within the hexbin. As we moved further into the project, however, we faced a limitation from EMA's datasets.

For public housing, the data provided by EMA only tells us the average electricity consumption by all apartments that falls under the same dwelling type. For instance, the dataset for July 2015 would tell us that 3-room flats in postal code 824601 used an average of 339 kwh of electricity. The crucial information that we were unable to obtain is how many 3-room flats are in postal code 824601, or what is the total electricity consumed by all the 3-room flats. This makes it impossible for us to accurately compute each postal code's weighted-average electricity consumption.

The only measure we can use to determine the public housings' hexbin color intensity is thus the average of averages. That is, for each postal code, we take the average energy consumption of 1-room/2-room, 3-room, 4-room and 5-room/executive apartments and treat that as the postal code average. However, we are aware that this is a very inaccurate representation of the actual postal code average, unless the postal code has an equally distribute number of dwelling types[7].

Suppressed Data

Datasets for both public and private housing contains many 's' values, which represent readings that are suppressed to avoid disclosure of individual data. Such values, whether left in the datasets or removed, will affect the accuracy of our analysis. If left in the datasets, then our computation of a postal code's average would treat the 's' values as 0 (since there is no reliable way to estimate the 's' values), and bring down the overall average. On the other hand, if postal codes containing 's' values are removed from our datasets, months' or years' worth of data would be missing. At the end, we decided to remove such postal codes as it is better to treat them as missing values rather than 0.

1718t1is428T15-Suppressed1.png
1718t1is428T15-Suppressed2.png

Geocoding Postal Codes

To be able to plot the postal codes on a map, we first need to convert the postal codes to longitude and latitude. We did this by creating a geocoder that calls upon One Map's search API. The geocoder is able to read an excel workbook containing the postal codes, and returns an updated excel workbook containing the longitude and latitude for each postal code.

To make the search more efficient, we first created a separate spreadsheet that compiles all the unique public and private postal codes across 2013-2015. Then with the results from the geocoder, we performed vlookup with excel to obtain the lat long for the file to be uploaded onto our app.

We found that for certain postal codes, One Map's API would return us the values for bus stops, rather than residential buildings.

1718t1is428T15-OneMapBusStops.png

And for some postal codes, although a residential building can be found through a Google search, the API was unable to return any results.

1718t1is428T15-OneMapCantFind.png


Technical Challenges

Key Technical Challenges How We Propose To Resolve
Unfamiliarity with d3.js libraries
  • Independent Learning
  • Consult Instructor Prakash
  • Peer Learning
Data Cleaning & Transformation
  • Work together to clean, transform and analyze the data
Unfamiliarity in Implementing Interactivity and Animation Tools/Techniques in Visualization App
  • Develop a Storyboard/Design Flow
  • Assign members to specialize on Interactivity/Animation Techniques


Timeline

1718t1is428T15-Timeline.png


Technologies/Tools

The following are technologies and tools which we used:

  • Microsoft Excel (data cleaning)
  • d3.js (visualisation)
  • Leaflet.js (overlaying of map)
  • d3-hexbin.js (overlaying of hexagonal bins, wrapper library of d3 and leaflet)
  • Github (version control)


Reference


Comments