Difference between revisions of "Group06 Elec3city Proposal"

Latest revision as of 19:04, 14 April 2019

Project Motivation

Rising energy consumption is an issue that plagues the Singapore government for several years. Recently, the government has begun pushing for more efficient energy usage, and most effort is expended on the efficiency of energy sources – e.g. using less carbon-intensive fuels. In exploring potential ways to aid this cause, we realised that there has been scant statistical analysis energy consumption patterns. As such, our team feels that there is a need for an app which allows for authorities in Singapore such as the National Environment Agency to understand with data-driven evidence the origins of variation in Singapore energy consumption, so as to allow for more targeted efforts to reduce energy wastage. Another factor that provided a fertile ground for the development of this application is the availability of energy consumption data, with granularity right down to individual postal code.

Project Objective

We have been observing rising trends in energy consumption across all sub-sectors, including households, which has seen sharp increase over the years from 6092.5GWh in 2005 to 7295.8GWh in 2017. As such, household sub-sector is a good starting point for our analysis.

Energy Consumption Patterns in Singapore

Through the exploration of potential energy consumption patterns among households in Singapore, we will be able to tease out potential drivers of energy consumption. We will not only be looking at seasonal patterns in consumption (i.e. Is there a particular month in a year when energy consumption spikes?), we will also try to identify spatial patterns in consumption (i.e. Is there a particular region in Singapore whereby energy consumption is particularly high?).

Temperature VS Energy Consumption

The next issue that we are interested in exploring is the relationship between temperature and energy consumption. Intuitively, we will assume that as temperature increases, energy consumption will also increase due to more electrical appliances used to regulate temperature such as air-conditioner. Thus, we plan to explore the relationship between temperature in different region of Singapore, and how this will affect the energy consumption in each respective region.

Data

Data	Source	Data Type
Average Monthly Household Electricity Consumption by Postal Code (Public Housing) & Dwelling Type, 2H 2016	Energy Market Authority (ema.gov.sg)	xls
Average Monthly Household Electricity Consumption by Postal Code (Public Housing) & Dwelling Type, 1H 2016	Energy Market Authority (ema.gov.sg)	xls
Average Monthly Household Electricity Consumption by Postal Code (Public Housing) & Dwelling Type, 2H 2015	Energy Market Authority (ema.gov.sg)	xls
Average Monthly Household Electricity Consumption by Postal Code (Public Housing) & Dwelling Type, 1H 2015	Energy Market Authority (ema.gov.sg)	xls
Average Monthly Household Electricity Consumption by Postal Code (Public Housing) & Dwelling Type, 2H 2014	Energy Market Authority (ema.gov.sg)	xls
Average Monthly Household Electricity Consumption by Postal Code (Public Housing) & Dwelling Type, 1H 2014	Energy Market Authority (ema.gov.sg)	xls
Average Monthly Household Electricity Consumption by Postal Code (Public Housing) & Dwelling Type, 2H 2013	Energy Market Authority (ema.gov.sg)	xls
Average Monthly Household Electricity Consumption by Postal Code (Public Housing) & Dwelling Type, 1H 2013	Energy Market Authority (ema.gov.sg)	xls
Average Monthly Household Electricity Consumption by Postal Code (Private Apartments), 2015 and 2016	Energy Market Authority (ema.gov.sg)	xls
Average Monthly Household Electricity Consumption by Postal Code (Private Apartments), 2013 to 2014	Energy Market Authority (ema.gov.sg)	xls
Resident Households by Planning Area and Dwelling Type/Household Size/Monthly Household Income	Department of Statistics Singapore (singstat.gov.sg)	xls
Singapore Residents by Planning Area/Subzone, Age Group and Sex, June 2000 - 2018	Department of Statistics Singapore (singstat.gov.sg)	csv
Singapore Residents by Planning Area/Subzone and Type of Dwelling, June 2000 - 2018	Department of Statistics Singapore (singstat.gov.sg)	csv
Singapore Climate Historical Data - crawled to get temperature and rain data from 2013 to 2016 at daily granularity	Meteorological Service Singapore (weather.gov.sg)	csv

Literature Review

In our due diligence for the project, the team looked at multiple research papers to inform and influence us in the best practices for analyzing geospatial variation in energy use, when it is to be compared against variables such as temperature and housing composition.

1. Appropriate use of Interpolation Methods in GIS - Mitas, L. and Mitasova, H. Spatial Interpolation, Chap. 34 Spatial Interpolation (2005)

Aim of literature: to enlighten reader of the appropriate interpolation method for different GIS themes.

Comparison of Digital Elevation Models computed from contours, splines with tension and stream enforcement, and by regularised spline with tension (RST)

Methodology:
1. Inverse Distance Weighted Interpolation (IDW) - adopted
2. Kriging - rejected
3. Regularised spline with tension (RST) - rejected

Learning Points:
1. Inverse Distance Weighted Interpolation (IDW)

Pro: relatively less demanding computationally
Pro: better at reproducing approximations on linear patterns
Con: "produces local extrema at the data points"

2. Kriging

Con: While good at predicting spatial distribution of uncertainty, it is less successful for applications where local geometry and smoothness are the key issues - Critical weakness for our interpolation of temperature data where granularity is at housing block level, thus Kriging is rejected.

3. Regularised spline with tension (RST)

Pro: Allows for smoothing according to parameters such as the tension φ and smoothing weights {wj} which are empirically informed through minimisation of the predictive error estimated by a cross-validation procedure
Pro: Can realistically represent rough gradients in spite of the smoothness condition, if the roughness is sufficiently described by the input data - might be true of temperature when it comes to the Urban Heat Island effect - pockets of high building density can cause a micro-climate of higher temperatures; particularly pertinent in Singapore.
Con: requires a lot of 'guess-timation' and past domain knowledge to fine-tune the tension and smoothing weights.

Areas for improvement:
Our team has selected IDW as the interpolation technique for smoothing of temperature data of the 22 weather stations across Singapore.

2. A Spatial Analysis of the Relationship between Vegetation and Poverty - Dawson T., Sandoval J.S., Sagan V. and Crawford T. (2018)

Aim of literature: investigate poverty and inequities that are associated with vegetation

Geospatial Visualisation of MAXN (regression coefficient for the time variable showing trend in Normalized Difference Vegetation Index) against race poverty geospatial distribution

Local R-Squared values of model in Detroit

Methodology:
1. Pixel level regression - Curve Fit extension in ArcGIS

Run regression trend analysis using raster datasets for temporal analysis

2. Global Ordinary Least Squares (OLS) regression

Capture global geospatial correlation

3. Local Geographically Weighted Regression (GWR)

Capture local geospatial correlation

4. Moran's I for spatial autocorrelation

For local level analysis of spatial autocorrelation

5. Local Indicators of Spatial Association (LISA) map - Contiguity Edges and Corners method

Queen contiguity to show clustering

Learning Points:
1. Pixel level regression - Curve Fit extension in ArcGIS

Helps us see the degree of model prediction for energy consumption given our variables

2. Global Ordinary Least Squares (OLS) regression

Investigate if the distributions of these random variables all have the same variance and a mean of zero. If so, then the least squares method may be the best unbiased linear estimator of the model coefficient.
If residuals are spatially correlated, OLS results are biased. GWR models would then be used to remove the spatial autocorrelation of residuals.

3. Local Geographically Weighted Regression (GWR)

Provides local t-values with which to find level of confidence in our local model

4. Moran's I for spatial autocorrelation

We can use this to ascertain if local level analysis is indeed appropriate to understand the relationship between income level and energy consumption, after accounting for other factors like number of household members and number of rooms.

5. Local Indicators of Spatial Association (LISA) map - Contiguity Edges and Corners method

Shows us clustering of energy consumption at local level

Areas for improvement:
1. Pixel level regression - Curve Fit extension in ArcGIS

No ArcGIS - so we use curveFit function provided in mixtox v1.3 package by Xiangwei Zhu

3. Using GIS to target outreach For LADWP (Los Angeles Department of Water and Power) Customer Rebate Programs

Aim of literature: reduce traditional energy usage and promoting sustainable energy production through geographically segmented marketing

Residential Relative Energy Efficiency Index (REEI) 2009-2012 Choropleth

Local Moran's I for REEI - most and least efficient block groups

Methodology:
1. Creation of a REEI (Relative Energy Efficiency Index)

Done by dividing the zonal average consumption growth rate by the consumption change rate for each block group.

2. Global Moran’s I

Determine if spatial autocorrelation is taking place

3. Local Moran's I

See where clustering is taking place

Learning Points:
1. REEI (Relative Energy Efficiency Index)

Team can look into calculating such an index for each HDB parcel

Areas for improvement:
1. The temperature data used was too simple - only two zones of temperature.

Our team will use the previously learnt RST interpolation method to create a model for temperature geospatial variation, that also allows for temporal analysis.

Approach

Data Collection and Preprocessing

We collected 2013- 2016 average monthly household electricity consumption by postal code and dwelling type from Energy Market Authority. The postal codes are matched with longitudes and latitudes with the use of OneMap API.
Singapore Temperature Historical Data from 2013 to 2016 at daily granularity were crawled from Meteorological Service Singapore.

Methodology

Hot Spot and Cold Spot maps:

The maps show which areas have high energy consumption and low energy consumption. First we used adaptive distance weight matrix to define neighbours. Based on the adaptive distance weight matrix, we computed Getis -Ord Gi statistics. A hot spot area has significantly positive Gi statistics which means location i is associated with relatively high values of the surrounding locations. A cold spot area has significantly negative Gi statistics which means location i is associated with relatively low values of the surrounding locations.

LISA Map

Local Indicator of Spatial Association (LISA) maps help us identify the outliers and clusters of the energy consumption observations.

Spatial Interpolation

Since we only have 21 meteorological stations that have complete data in Singapore, Spatial interpolation is adopted to use points with known temperature values to estimate values at other unknown points.

Geographically Weighted Regression Local R square.

The estimated temperature and energy consumption data will be used in the Geographically Weighted Regression (GWR) model. The GWR model will generate Local R square values which indicate how well the local regression model fits observed y values. Very low values indicate the local model is performing poorly. In our case, it means low correlation between temperature and energy in the areas.

Web Application Design

Design Inspiration

The dashboard design is inspired by https://stanleyadion.shinyapps.io/AmazeingCrop

Initial Storyboard

	Design	Description
1.		Project and Dataset Overview
2.		Bivariate Choropleth Maps showing relationships between energy consumption with other factors Users can choose the factor they want to compare with energy consumtion
3.		A Box-plot showing distributions of energy consumption by Planning Zone and Dwelling Type
4.		Lisa Maps showing spatial clustering of energy consumption observations
5.		Overview of Data for GWR model
6.		Transform Data for GWR model Users can use a histogram to check whether the variable is normally distributed
7.		Select Variables for GWR model Users can remove correlated variables with the help of the correlation matrix plot
8.		Configure a GWR model and view the results

Project Challenges

	Key Challenges	Description	Solution
1.	Temperature Data Collection	We can only download the temperature data from Meteorological Service Singapore for one station and one month each time. There are more than 60 stations and 4 years of data to be downloaded for this project, which can be very time consuming.	Discovered a pattern of the data links Used excel to auto-generate all the required data links Used Internet Download Manager to download from all the data links
2.	Imperfect Temperature Data	Temperature information is only collected at the designated temperature stations.	Use spatial interpolation techniques to estimate the temperature around the temperature stations.

Project Timeline

Gantt Chart of Team's Timeline - FULL Updated Version
Snapshot of Gantt Chart (as of 3 March 2019)

Feel free to leave any comments! :)

No.	Name	Date	Comments
1.	Insert your Name here	Insert Date here	Insert Comment here
2.	Insert your Name here	Insert Date here	Insert Comment here
3.	Insert your Name here	Insert Date here	Insert Comment here

Difference between revisions of "Group06 Elec3city Proposal"

Latest revision as of 19:04, 14 April 2019

Contents

Project Motivation

Project Objective

Data

Literature Review

Approach

Web Application Design

Design Inspiration

Initial Storyboard

Project Challenges

Project Timeline

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

@@ Line 4: / Line 4: @@
 {| style="background-color:#ffffff ; margin: 3px 10px 3px 10px; width="80%"|
-| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #f5f5f5" width="190px" |
+| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #f5f5f5" width="210px" |
 [[Elec3city|<font color="#3c3c3c"><strong>HOME</strong></font>]]
@@ Line 28: / Line 28: @@
 <div style="text-align: left; direction: ltr; margin-left: 1em;">
-<br/>
+Rising energy consumption is an issue that plagues the Singapore government for several years. Recently, the government has begun pushing for more efficient energy usage, and most effort is expended on the efficiency of energy sources – e.g. using less carbon-intensive fuels. In exploring potential ways to aid this cause, we realised that there has been scant statistical analysis energy consumption patterns. As such, our team feels that there is a need for an app which allows for authorities in Singapore such as the National Environment Agency to understand with data-driven evidence the origins of variation in Singapore energy consumption, so as to allow for more targeted efforts to reduce energy wastage. Another factor that provided a fertile ground for the development of this application is the availability of energy consumption data, with granularity right down to individual postal code.
 </div>
-<br/>
 == Project Objective ==
-<div style="text-align: left; direction: ltr; margin-left: 1em;"><strong>Through our project, we aim to: </strong>
+<div style="text-align: left; direction: ltr; margin-left: 1em;">
-#
+We have been observing rising trends in energy consumption across all sub-sectors, including households, which has seen sharp increase over the years from 6092.5GWh in 2005 to 7295.8GWh in 2017. As such, household sub-sector is a good starting point for our analysis.
-#
-#
+*Energy Consumption Patterns in Singapore
-#
+Through the exploration of potential energy consumption patterns among households in Singapore, we will be able to tease out potential drivers of energy consumption. We will not only be looking at seasonal patterns in consumption (i.e. Is there a particular month in a year when energy consumption spikes?), we will also try to identify spatial patterns in consumption (i.e. Is there a particular region in Singapore whereby energy consumption is particularly high?).
+*Temperature VS Energy Consumption
+The next issue that we are interested in exploring is the relationship between temperature and energy consumption. Intuitively, we will assume that as temperature increases, energy consumption will also increase due to more electrical appliances used to regulate temperature such as air-conditioner. Thus, we plan to explore the relationship between temperature in different region of Singapore, and how this will affect the energy consumption in each respective region.
 </div>
 <br/>
-== Data Preparation ==
+== Data ==
 {| class="wikitable"
 |-
 ! Data !! Source !! Data Type
 |-
-|[https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/52RSU.xls Average Monthly Household Electricity Consumption by Postal Code (Public Housing) & Dwelling Type, 2H 2016]  ||ema.gov.sg || xls
+|[https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/52RSU.xls Average Monthly Household Electricity Consumption by Postal Code (Public Housing) & Dwelling Type, 2H 2016]  ||Energy Market Authority (ema.gov.sg)|| xls
 |-
-|  ||  ||
+|[https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/51RSU.xls Average Monthly Household Electricity Consumption by Postal Code (Public Housing) & Dwelling Type, 1H 2016]  ||Energy Market Authority (ema.gov.sg)|| xls
 |-
-|  ||  ||
+|[https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/25RSU.xls Average Monthly Household Electricity Consumption by Postal Code (Public Housing) & Dwelling Type, 2H 2015]  ||Energy Market Authority (ema.gov.sg)|| xls
 |-
-|  ||  ||
+|[https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/23RSU.xls Average Monthly Household Electricity Consumption by Postal Code (Public Housing) & Dwelling Type, 1H 2015]  ||Energy Market Authority (ema.gov.sg)|| xls
 |-
-|  ||  ||
+|[https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/MSA21.xls Average Monthly Household Electricity Consumption by Postal Code (Public Housing) & Dwelling Type, 2H 2014]  ||Energy Market Authority (ema.gov.sg)|| xls
 |-
-|  ||  ||
+|[https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/MSA17.xls Average Monthly Household Electricity Consumption by Postal Code (Public Housing) & Dwelling Type, 1H 2014]  ||Energy Market Authority (ema.gov.sg)|| xls
 |-
-|  ||  ||
+|[https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/MSA18.xls Average Monthly Household Electricity Consumption by Postal Code (Public Housing) & Dwelling Type, 2H 2013 ]  ||Energy Market Authority (ema.gov.sg)|| xls
 |-
-|  ||  ||
+|[https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/MSA16.xls Average Monthly Household Electricity Consumption by Postal Code (Public Housing) & Dwelling Type, 1H 2013]  ||Energy Market Authority (ema.gov.sg)|| xls
 |-
-|  ||  ||
+|[https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/2RSU.xls Average Monthly Household Electricity Consumption by Postal Code (Private Apartments), 2015 and 2016]  ||Energy Market Authority (ema.gov.sg)|| xls
 |-
-|  ||  ||
+|[https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/22RSU.xls Average Monthly Household Electricity Consumption by Postal Code (Private Apartments), 2013 to 2014]  ||Energy Market Authority (ema.gov.sg)|| xls
-|}
-<br/>
-=== Data Collection ===
-All the data required for this project is readily available for download from either data.gov.sg or OpenStreetMap except for the accidents and heavy traffic data.
-==== Collecting accidents and heavy traffic data ====
-The accidents and heavy traffic data available from mytransport.sg are real time data which required API calling to retrieve the data. No historical accidents and heavy traffic data is available from mytransport.sg. Thus, in order to collect the data, we had to write a script on PowerShell that calls the API periodically to retrieve the JSON file containing the real-time data. Then, we wrote a script in PowerShell to convert the JSON file to a CSV file for ease of use.
-We spent 5 weeks calling the API regularly to retrieve the real-time data and this gave us 335 accident points and 877 heavy traffic points, a sufficient quantity for analysis.
-The data collected is in the format below:
-{| class="wikitable"
-|-
-! Attributes  !! Example
-|-
-| Type || Accident
-|-
-| Latitude || 1.319629
 |-
-| Longitude || 103.8537
+|[https://www.singstat.gov.sg/-/media/files/publications/ghs/ghs2015/excel/t148-152.xls Resident Households by Planning Area and Dwelling Type/Household Size/Monthly Household Income]  ||Department of Statistics Singapore (singstat.gov.sg)|| xls
 |-
-| Date || 22/2/2018
+|[https://www.singstat.gov.sg/-/media/files/find_data/population/statistical_tables/respopagsex2000to2018.zip Singapore Residents by Planning Area/Subzone, Age Group and Sex, June 2000 - 2018]  ||Department of Statistics Singapore (singstat.gov.sg)|| csv
 |-
-| Time || 10:33:00 PM
+|[https://www.singstat.gov.sg/-/media/files/find_data/population/statistical_tables/respoptod2000to2018.zip Singapore Residents by Planning Area/Subzone and Type of Dwelling, June 2000 - 2018]  ||Department of Statistics Singapore (singstat.gov.sg)|| csv
 |-
-| Description || Accident on CTE (towards SLE) after Moulmein Rd Exit with congestion till Kramat Rd Entrance. Avoid lanes 1 and 2.
+|[http://www.weather.gov.sg/climate-historical-daily Singapore Climate Historical Data - crawled to get temperature and rain data from 2013 to 2016 at daily granularity]  ||Meteorological Service Singapore (weather.gov.sg)|| csv
 |}
-With all the data ready, we can now proceed for data cleaning.
+== Literature Review ==
+<div style="text-align: left; direction: ltr; margin-left: 1em;">
+In our due diligence for the project, the team looked at multiple research papers to inform and influence us in the best practices for analyzing geospatial variation in energy use, when it is to be compared against variables such as temperature and housing composition.
-<br/>
+<strong>1. Appropriate use of Interpolation Methods in GIS - Mitas, L. and Mitasova, H. Spatial Interpolation, Chap. 34 Spatial Interpolation (2005) </strong><br/><hr/>
-=== Data Cleaning ===
-<br/>
-==== Extracting Expressway Networks ====
-[[File:Roadrunners datacleaning1.jpg|800px|frameless|center|OSM Shape File]]
-The shape file downloaded from OpenStreetMap gives us the entire road network of Singapore and some part of Malaysia. However, we only require the expressway road networks in Singapore. Thus, some data preparation is needed to extract the road networks that is needed. We have decided to perform this data preparation on QGIS as it gives us a better visualisation of the road network which allows us to easily detect any errors.
+<strong>Aim of literature:</strong> to enlighten reader of the appropriate interpolation method for different GIS themes.
-Firstly, we used the geoprocessing tool on QGIS to extract the road networks that only occur in Singapore. We performed the vector intersection function between the road networks layer and a layer containing the coastal outline of Singapore, which is downloaded from data.gov.sg. This returns us a layer consisting road networks that only occur in Singapore
+<br/><br/>
+[[File:Spline new edit.png|200px|framed|center|Comparison of Digital Elevation Models computed from contours, splines with tension and stream enforcement, and by regularised spline with tension (RST)]]
-Next, we performed filtering on the data frame to extract the expressway road networks. The data frame component of the shape files contains an attribute called ‘type’. We were able to obtain the expressway network by filtering ‘type’ = ‘motorway’.
-[[File:Roadrunners datacleaning2.jpg|800px|frameless|center]]
-Lastly, we performed some manual check to remove some erroneous lines.
 <br/>
-==== Extracting accidents and heavy traffic points that occur on expressway ====
-The table above shows the attributes of the accidents and heavy traffic data. The points that we are only interested in are those that occur on the expressway. To obtain these points, we used R to perform our data cleaning.
+<strong>Methodology:</strong><br/>
+. Inverse Distance Weighted Interpolation (IDW) - <u><b>adopted</b></u><br/>
-   patterns <- c('on AYE','on BKE','on CTE','on ECP','on KJE','on KPE','on MCE','on PIE','on SLE','on TPE')
+. Kriging - <i><b>rejected</b></i><br/>
+. Regularised spline with tension (RST) - <i><b>rejected</b></i><br/>
-We could extract the expressway points by filtering the points that contain the expressway names in the ‘Description’ attribute. A ‘patterns’ variable is created to store the phrases that appear on expressway points.
-   accidents_filter <- trafficReport %>% filter(grepl(paste(patterns, collapse="|"), Descriptions)) %>% filter(Type == 'Accident')
-Lastly, we used the ‘grepl’ function in R to extract only the points that contain any of the phrases above in the ‘Description’ attribute.
-<br/>
-==== Extracting the cameras ====
-Similarly, we only require cameras that occur on expressways. This extraction is slightly more time consuming as there are no attributes in the cameras shape file which indicates whether or not the cameras are located on the expressway. Thus, to obtain the cameras that only occur on expressway, we did manual filtering on QGIS based on the expressway road network we have created previously. We repeated this step for the different types of cameras.
-[[File:Roadrunners datacleaning3.jpg|800px|frameless|center]]
-Lastly, we combined all the cameras file together into a single shape file by using the join function.
-<br/>
-== Literature Review ==
-<div style="text-align: left; direction: ltr; margin-left: 1em;">
-To gain a better understanding of how we could proceed with our analysis, we decided to conduct a literature review. Here are the summaries of some research paper on spatial analysis of traffic accidents:
 <br/><br/>
-<strong>1. GIS-based spatial analysis of urban traffic accidents: Case study in Mashhad, Iran</strong><br/><hr/>
+<strong>Learning Points:</strong><br/>
-<strong>Aim of study:</strong> to use geographic information technology (GIS) and spatial-statistical analysis to gain insights of the traffic accident patterns in Mashhad, Iran. <br/>
+. Inverse Distance Weighted Interpolation (IDW)
-<br/><br/>
+* Pro: relatively less demanding computationally
+* Pro: better at reproducing approximations on linear patterns
-[[File:Kernel.png|200px|framed|center|Results of kernel density level for accidents leading to injury from March 21, 2011 to March 19, 2012]]
+* Con: "produces local extrema at the data points"
 <br/>
+. Kriging
-<strong>Methodology:</strong><br/>
+* Con: While good at predicting spatial distribution of uncertainty, it is less successful for applications where <b>local geometry</b> and <b>smoothness</b> are the key issues - Critical weakness for our interpolation of temperature data where granularity is at housing block level, thus Kriging is rejected.
-. Kernel Density Estimation
-* To determine static hotspots
-. Nearest Neighbour Distance Analysis
-* Used to determine if the accidents are clustered based on the nearest distance between two neighbouring accident points
-. K-function output analysis
-* Used to provide a more accurate analysis of points distribution
 <br/>
-<strong>Learning Points:</strong><br/>
+. Regularised spline with tension (RST)
-. Spatial Analysis Techniques
+* Pro: Allows for smoothing according to parameters such as the tension φ and smoothing weights {wj} which are empirically informed through minimisation of the predictive error estimated by a cross-validation procedure
-* This study is similar to our project. Hence, we can learn the analysis technique they have used and apply it to our study
+* Pro: Can realistically represent rough gradients in spite of the smoothness condition, if the roughness is sufficiently described by the input data - might be true of temperature when it comes to the Urban Heat Island effect - pockets of high building density can cause a micro-climate of higher temperatures; particularly pertinent in Singapore.
-* Similarly, we can use Kernel Density Estimation to detect traffic accident hotspots and Nearest Neighbour K function to determine if the accidents are randomly distributed or clustered
+* Con: requires a lot of 'guess-timation' and past domain knowledge to fine-tune the tension and smoothing weights.
 <br/>
 <strong>Areas for improvement:</strong><br/>
-. Hard to follow up
+Our team has selected <b>IDW</b> as the interpolation technique for smoothing of temperature data of the 22 weather stations across Singapore.
-* As this analysis is done on a proprietary software (Arcview), it is impossible to reproduce the same study done by the researchers. Thus, it is hard for other researchers to follow up on their study.
 <br/><br/>
+<strong>2. A Spatial Analysis of the Relationship between Vegetation and Poverty - Dawson T.,  Sandoval J.S., Sagan V. and Crawford T. (2018)</strong><br/><hr/>
-<big><strong>2. Brazilian Road Traffic Fatalities: A Spatial and Environmental Analysis</strong><br/></big><hr/>
+<strong>Aim of literature:</strong> investigate poverty and inequities that are associated with vegetation
-<strong>Aim of study:</strong> to analyse road traffic accidents hotspots in BR 277 highway located in the state of Parana, southern Brazil and performed environmental analysis to identify patterns contributing to the traffic accidents. <br/>
-[[File:Ref2.png|200px|framed|center|Kernel density and wavelet analysis hotspots. 3A) All Fatal Crashes]]
+<br/><br/>
+[[File:Detroit lisa.png|200px|framed|center|Geospatial Visualisation of MAXN (regression coefficient for the time variable showing trend in Normalized Difference Vegetation Index) against race poverty geospatial distribution]]
+[[File:Detroit gwr localrsquared.png|200px|framed|center|Local R-Squared values of model in Detroit]]
 <br/>
 <strong>Methodology:</strong><br/>
-. Kernel Density Estimation
+. Pixel level regression - Curve Fit extension in ArcGIS
-* To determine accident hotspots
+* Run regression trend analysis using raster datasets for temporal analysis
-. Wavelet
+. Global Ordinary Least Squares (OLS) regression
-* Complement Kernel exploratory analysis
+* Capture global geospatial correlation
-. K-function output analysis
+. Local Geographically Weighted Regression (GWR)
-* To reduce the variables into similar variance components
+* Capture local geospatial correlation
-* Then developed regression models to evaluate the impact of built environmental components on fatal crashes
+. Moran's I for spatial autocorrelation
+* For local level analysis of spatial autocorrelation
+. Local Indicators of Spatial Association (LISA) map - Contiguity Edges and Corners method
+* Queen contiguity to show clustering
 <br/>
 <strong>Learning Points:</strong><br/>
+. Pixel level regression - Curve Fit extension in ArcGIS
-. Spatial Analysis Techniques
+* Helps us see the degree of model prediction for energy consumption given our variables
-* Apart from using Kernel Density Estimation to develop hotspots as well as K function to determine complete spatial randomness like the previous study, this research also explores the impact of how the human built environment affects the occurrence of accidents.
+. Global Ordinary Least Squares (OLS) regression
-* We could possibly learn from this project how the built environment analysis is being executed and then determine how various infrastructures on the road affects the occurrence of accidents.
+*  Investigate if the distributions of these random variables all have the same variance and a mean of zero. If so, then the least squares method may be the best unbiased linear estimator of the model coefficient.
+* If residuals are spatially correlated, OLS results are biased. GWR models would then be used to remove the spatial autocorrelation of residuals.
+. Local Geographically Weighted Regression (GWR)
+* Provides local t-values with which to find level of confidence in our local model
+. Moran's I for spatial autocorrelation
+* We can use this to ascertain if local level analysis is indeed appropriate to understand the relationship between income level and energy consumption, after accounting for other factors like number of household members and number of rooms.
+. Local Indicators of Spatial Association (LISA) map - Contiguity Edges and Corners method
+* Shows us clustering of energy consumption at local level
 <br/>
 <strong>Areas for improvement:</strong><br/>
-. Hard to follow up
+. Pixel level regression - Curve Fit extension in ArcGIS
-* Similar to the previous study, this analysis is done on a proprietary software (QGIS), it is impossible to reproduce the same study done by the researchers. Thus, it is hard for other researchers to follow up on their study.
+* No ArcGIS - so we use curveFit function provided in mixtox v1.3 package by Xiangwei Zhu
 <br/><br/>
+<strong>3. Using GIS to target outreach For LADWP (Los Angeles Department of Water and Power) Customer Rebate Programs</strong><br/><hr/>
-<big><strong>3. IS415 2013-14 Assignment 2 – Heng U San </strong><br/></big><hr/>
+<strong>Aim of literature: reduce traditional energy usage and promoting sustainable energy production through geographically segmented marketing</strong>
-<strong>Aim of study:</strong> to analyse the distribution of GP Clinics, Preschools and Bus Stops in Bedok and provide recommendation on how amenities could be better planned. <br/>
-[[File:USanPIC165.png|200px|framed|center|Density function for buildings]]
+<br/><br/>
+[[File:Map 4.1 Residential Relative Energy Efficiency Index (REEI) 2009-2012.png|200px|framed|center|Residential Relative Energy Efficiency Index (REEI) 2009-2012 Choropleth]]
+[[File:Map 4.2 Local Moran's I for Residential REEI.png|200px|framed|center|Local Moran's I for REEI - most and least efficient block groups]]
 <br/>
 <strong>Methodology:</strong><br/>
-. Nearest Neighbour Index
+. Creation of a REEI (Relative Energy Efficiency Index)
-* lpp function – to measure distance between points along a linear network
+* Done by dividing the zonal average consumption growth rate by the consumption change rate for each block group.
-. K-function
+. Global Moran’s I
-* To determine the clustering type
+* Determine if spatial autocorrelation is taking place
+. Local Moran's I
+* See where clustering is taking place
 <br/>
 <strong>Learning Points:</strong><br/>
+. REEI (Relative Energy Efficiency Index)
-. Clear and easy to understand
+* Team can look into calculating such an index for each HDB parcel
-* U San offered a very clear and easy to understand explanation of how Nearest Neighbour Index and K function works. This helped us significantly in understanding how these techniques are used in the other research papers.
-* U San’s work was well documented. She clearly explained the step by step procedure of how he obtained her results as well as the R functions used for analysis. This makes it much easier for other researchers to reproduce a similar study.
-* To analyse the spatial distribution of bus stops, U San included a road network constraint in the various analysis. This is done because bus stops can only occur on road networks. Similar to our study, accidents can only occur on road networks. Thus the road network constraint should be included in our analysis or else our result will not make sense.
 <br/>
 <strong>Areas for improvement:</strong><br/>
-. Sharing of codes
+. The temperature data used was too simple - only two zones of temperature.
-* U San did well in documenting her step by step procedure, teaching other researchers to know how to reproduce a similar study. However, it will be even better if U San could share a R notebook of her codes so that researchers could reproduce the exact same study and continue her research from where she stopped.
+* Our team will use the previously learnt RST interpolation method to create a model for temperature geospatial variation, that also allows for temporal analysis.
 <br/><br/>
-</div>
-<br/>
 == Approach ==
-After performing the literature review, we have a better understanding of what methodology could be used to achieve our objective. We then consulted our professor to decide the most appropriate analysis technique for use and finally we chose the techniques below.
+<div style="text-align: left; direction: ltr; margin-left: 1em;">
-<br/>
+<Strong>Data Collection and Preprocessing</Strong>
-=== Kernel Density Estimation with Network Constraints ===
+<p>
+*We collected 2013- 2016 average monthly household electricity consumption by postal code and dwelling type from Energy Market Authority. The postal codes are matched with longitudes and latitudes with the use of OneMap API.
-[[File:Roadrunners approach1.jpg|800px|frameless|center]]
+*Singapore Temperature Historical Data from 2013 to 2016 at daily granularity were crawled from Meteorological Service Singapore.
+<br>
+<Strong>Methodology</Strong>
+*Hot Spot and Cold Spot maps:
+The maps show which areas have high energy consumption and low energy consumption. First we used adaptive distance weight matrix to define neighbours. Based on the adaptive distance weight matrix, we computed Getis -Ord Gi statistics. A hot spot area has significantly positive Gi statistics which means location i is associated with relatively high values of the surrounding locations.  A cold spot area has significantly negative Gi statistics which means location i is associated with relatively low values of the surrounding locations.
+*LISA Map
+Local Indicator of Spatial Association (LISA) maps help us identify the outliers and clusters of the energy consumption observations.
-Kernel Density Estimation with Network Constraint is used to identify the location along the network which has a high concentration of traffic incidents. The formula for converting the observations into a Kernel Function is shown above. The bandwidth, T , can be adjusted to smooth out the Kernel Density Function. The Kernel Density Function with Network Constraint is executed in R using the spatstat package by applying the ‘density.lpp’ function on an lpp object.
+*Spatial Interpolation
-<br/>
+Since we only have 21 meteorological stations that have complete data in Singapore,  Spatial interpolation is adopted to use points with known temperature values to estimate values at other unknown points.
-=== Ripley's K Function with Network Constraints ===
-The Ripley’s K Function is a spatial analysis method used to describe how point patterns occur are distributed over an area of interest. It allows us to determine if the point patterns are dispersed, clustered or randomly distributed. The formula above shows how we can obtain the K function given the observations.
+*Geographically Weighted Regression Local R square.
-How K Function is used:
+The estimated temperature and energy consumption data will be used in the Geographically Weighted Regression (GWR) model. The GWR model will generate Local R square values which indicate how well the local regression model fits observed y values. Very low values indicate the local model is performing poorly. In our case, it means low correlation between temperature and energy in the areas.
-.	A circle of radius h is constructed around each observation
-.	The number of observations that fall inside each circle is counted
-<br/>
-[[File:Roadrunners approach1.jpg|800px|frameless|center]]
-<br/>
-The formula is applied to obtain the K function at a radius h
-.	The above 3 steps are repeated for different values of h.
-.	A graph of K function against h is then plotted.
-.	Monte Carlo simulation tests are then run to determine the K function of randomly distributed point patterns.
-.	Compare the K Function of the observations with the K Function of the simulations. If The K function of the observation is higher than the upper bound of the simulations, it suggests that there are signs of clustering. On the other hand, if the K function of the observations is lower than the lower bound of the simulations, It suggests that there are signs of dispersion. Otherwise, if the K function of the observations is within the upper and lower boundary, it suggest that the points are in complete spatial randomness. Refer to the figure below for better illustration.
-<br/>
-[[File:Roadrunners approach3.jpg|800px|frameless|center]]
-<br/>
-However, a slight modification is added to our K function to include network constraints. This means that the circle of radius h will only expand along the road network instead of expanding freely.
-K Function with linear constraint is executed in R with the spatstat package using the linearK function.
-<br/>
-=== Multitype K Function with Network Constraints ===
-The multitype K function is an extension of the Ripley’s K function. The algorithm is mostly the same, however instead of counting the number of same type observations in a circle with radius h, the number of observations belonging to the other type is counted. For example, a circle of radius h is formed around the traffic cameras and the number of accident points within this circle is counted. This step is repeated for all cameras and for a range of radius h. Lastly, the K function is plotted. The multitype K function can also be applied in R using the spatstat package with the linearKcross function.
+</div>
-<br/>
 == Web Application Design ==
 === Design Inspiration ===
+The dashboard design is inspired by https://stanleyadion.shinyapps.io/AmazeingCrop
 <br/>
-[[File:Roadrunners designinsp1.jpg|800px|frameless|center]]
-<br/>
-Superzip R Shiny is a sample R Shiny web application found in the R Shiny gallery. What makes it unique from other R Shiny dashboard is that the two charts in the menu are dynamic. The analysis area of the two charts are dependent on the boundary of the map that the user is viewing. This allows user to set their area of analysis by zooming into the map and shifting it to the area of interest.
-We could perhaps implement a similar feature in our R Shiny dashboard by allowing user to select their analysis area from the map. However, we will do the layout slightly differently from the example. Instead of having a floating menu, we will make the menu fixed to the side of the map as we feel that the floating menu will obstruct the user from seeing the entire analysis area.
 === Initial Storyboard ===
-<br/>
-[[File:Roadrunners initstory1.jpg|800px|frameless|center]]
-<br/>
-== Application Architecture ==
-The image below shows the application architecture of our web application.
-<br/>
-[[File:Roadrunners sa1.jpg|1000px|frameless|center]]
-<br/>
-== Application Overview ==
-[[File:Roadrunners app1.jpg|1000px|frameless|center]]
-<br>
 {| class="wikitable"
 |-
-! Type !! Feature !! Image !! Purpose
+!  !! Design !! Description
 |-
-| rowspan = "3" |Main || Upload Data || [[File:Roadrunners app2.jpg|600px|frameless|center]]||
+| 1. ||[[File:Elec3city dashboard 1.jpg|600 px]]  ||
-* Allows user to upload their own traffic data
+* Project and Dataset Overview
-* The format of the traffic data must be in a csv file and in the same format as the data collected from LTA’s API
-* The format of the road network data must be in a shape file
 |-
-| Toggle Map markers || [[File:Roadrunners app3.jpg|600px|frameless|center]]||
+| 2. ||[[File:Elec3city dashboard 2.jpg|600 px]] ||
-* Allows user to toggle the respective markers on the map
+* Bivariate Choropleth Maps showing relationships between energy consumption with other factors
+* Users can choose the factor they want to compare with energy consumtion
 |-
-|  Select Analysis || [[File:Roadrunners app4.jpg|600px|frameless|center]]||
+| 3. ||[[File:Elec3city dashboard 3.jpg|600 px]]  ||
-* Allows user to select the analysis to perform
+* A Box-plot showing distributions of energy consumption by Planning Zone and Dwelling Type
 |-
-| rowspan = "3" |Kernel Density Estimation || Select KDE variables || [[File:Roadrunners app5.jpg|600px|frameless|center]]||
+| 4. ||[[File:Elec3city dashboard 4.jpg|600 px]]  ||
-* Select the variable for Kernel Density Estimation
+* Lisa Maps showing spatial clustering of energy consumption observations
 |-
-| Slider || [[File:Roadrunners app6.jpg|600px|frameless|center]]||
+| 5. ||[[File:Elec3city dashboard 5.jpg|600 px]]  ||
-* The bandwidth of the Kernel Density plot.
+* Overview of Data for GWR model
-* A larger kernel distance will lead to a smoother plot while a smaller kernel distance will lead to a plot with more noise
-|-
-| Output options || [[File:Roadrunners app7.jpg|600px|frameless|center]]||
-* User can select if he/she wants to perform Kernel Density Estimation with or without network constraint
-|-
-| rowspan = "2" |K Function || Select Analysis Variable || [[File:Roadrunners app8.jpg|600px|frameless|center]]||
-* Allows user to select the variable to perform K Function analysis
 |-
-|  No. of Simulations || [[File:Roadrunners app9.jpg|600px|frameless|center]]||
+| 6. ||[[File:Elec3city dashboard 6.jpg|600 px]]  ||
-* Select the number of Monte Carlo Simulations to run for the K Function Analysis
+* Transform Data for GWR model
-* Higher number of simulations will require a longer loading time
+* Users can use a histogram to check whether the variable is normally distributed
 |-
-| rowspan = "2" | Multitype K Function || Select Analysis Variable || [[File:Roadrunners app10.jpg|600px|frameless|center]]||
+| 7. ||[[File:Elec3city dashboard 7.jpg|600 px]]  ||
-* Allows user to select the variable to perform Multitype K Function Analysis
+* Select Variables for GWR model
+* Users can remove correlated variables with the help of the correlation matrix plot
 |-
-| No. of Simulations || [[File:Roadrunners app11.jpg|600px|frameless|center]]||
+| 8. ||[[File:Elec3city dashboard 8.jpg|600 px]]  ||
-* Select the number of Monte Carlo Simulations to run for the Multitype K Function Analysis
+* Configure a GWR model and view the results
-* Higher number of simulations will require a longer loading time
-|-
-| rowspan = "4" |Outputs || KDE with network constraints || [[File:Roadrunners app12.jpg|600px|frameless|center]]||
-* View of the KDE plot
-* Legend of the KDE plot is indicated at the bottom left corner of the screen
-|-
-| KDE without network constraints || [[File:Roadrunners app13.jpg|600px|frameless|center]]||
-* View of the KDE plot
-* Legend of the KDE plot is indicated at the bottom left corner of the screen
-|-
-| K Function || [[File:Roadrunners app14.jpg|600px|frameless|center]]||
-* Output of the K Function analysis
-|-
-| Multitype K Function || [[File:Roadrunners app15.jpg|600px|frameless|center]]||
-* Output of the Multitype K Function analysis
 |}
-<br/>
-== Interesting Findings ==
-The following section describes some of the interesting discussions from the R Shiny Web Application we have created.
-We first performed a K Function analysis to determine if there is indeed clustering observed between accident points from a statistical point of view. A K Function analysis with 20 Monte Carlo Simulations is run.
-<br/><br/>
-[[File:Roadrunners findings0a.jpg|1000px|frameless|center]]
-<br/>
-Based on the K Function analysis as seen in the picture above, we can reject the null hypothesis at 95% confidence level (100 – 100/20) that the accident points are in complete spatial randomness and the accident points are indeed clustered between 200m to 5000m. Since we know that the accident points are clustered between 200m to 5000m, we can use any value between this range to plot the KDE output.
-<br/>
-[[File:Roadrunners findings1.jpg|800px|frameless|center]] [[File:Roadrunners findings2.jpg|600px|frameless|center]]
-<br/>
-Using a Kernel Distance of 1500m, we have obtained the KDE plot as seen in the picture above. From the plot, 6 accident hotspots have been identified.
-<br/>
-Next, we will move on to perform a KDE plot for heavy traffic. A K Function Analysis with 20 Monte Carlo Simulations is performed.
-<br/>
-[[File:Roadrunners findings0b.jpg|800px|frameless|center]]
-<br/>
-Similarly, clear clustering is observed for heavy traffic points. From the K Function analysis above, we are 95 % confident that clustering is observed at any given distance.
-<br/>
-[[File:Roadrunners findings3.jpg|800px|frameless|center]] [[File:Roadrunners findings4.jpg|600px|frameless|center]]
-<br/>
-Using a Kernel Distance of 1500m, we have obtained the following KDE plot of heavy traffic points. 5 heavy traffic hotspots have been identified.
-We also performed a Multitype K Function analysis to determine if heavy traffic points tend to be clustered around heavy traffic points. Two traffic cameras caught our attention.
-<br/><br/>
-[[File:Roadrunners findings5.jpg|1000px|frameless|center]]
-<br/>
-As seen in the picture above, the heavy traffic points tend to cluster around the two traffic cameras located at CTE between Bradell Road Exit and Ang Mo Kio Avenue 3 Exit. Based on the multitype K function analysis, which was run with 20 simulations, we are 95% confident that clustering indeed occurs around these traffic cameras.
-From this result alone, we are unable to determine if this is just a coincidental correlation or if the traffic cameras are the cause of the heavy traffic, or if the traffic police deliberately placed the cameras there. However, this analysis does provide us with a starting point for further analysis the reasons for traffic congestion in this area. To verify if the traffic cameras are indeed causing the congestion, further ground work needs to be done.
-<br/>
 == Project Challenges ==
 {| class="wikitable"
@@ Line 383: / Line 246: @@
 !  !! Key Challenges !! Description !! Solution
 |-
-| 1. || Lack of readily available data || There is currently no known data source that provides historical traffic accidents data in Singapore. There is only a real time API of traffic accidents from LTA. ||
+| 1. ||Temperature Data Collection  ||We can only download the temperature data from Meteorological Service Singapore for one station and one month each time. There are more than 60 stations and 4 years of data to be downloaded for this project, which can be very time consuming. ||
-* Learn to write a script that perform autonomous calling of the API
+* Discovered a pattern of the data links
-* Create a regular schedule for the calling of API
+* Used excel to auto-generate all the required data links
+* Used Internet Download Manager to download from all the data links
 |-
-| 2. || Unfamiliarity with R Shiny || We are unfamiliar with R programming language due to the lack of prior experience ||
+| 2. ||Imperfect Temperature Data ||Temperature information is only collected at the designated temperature stations.  ||
-* Independent learning starting from week 5
+* Use spatial interpolation techniques to estimate the temperature around the temperature stations.
-* Learning from each other
-* Consult Prof Kam
-|-
-| 3. || Unfamiliarity with spatial analysis techniques || We are unsure what spatial analysis techniques to use and how to apply it as we lack prior experience in geospatial analysis ||
-* Conduct literature review on the commonly used spatial analysis techniques
-* Research how we these techniques are executed
-* Independent learning on the analysis techniques from week 5
-* Learning from each other
-* Consult Prof Kam
 |}
 <br/><br/>
 <br/>
 == Project Timeline ==
-[[File:Roadrunners Timeline.png|1200px|frameless|center]]
 <br/><br/>
+[https://docs.google.com/spreadsheets/d/1uUxaRZqa5FhF3voPMc0w3HcNnOHgbFlnvJWyN4htZPE/edit#gid=734092126 Gantt Chart of Team's Timeline - FULL Updated Version]<br>
-== Meet the Team ==
+Snapshot of Gantt Chart (as of 3 March 2019)
+[[File:Gantt Chart2.png|left|1000px|Gantt Chart Snapshot]]<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
-[[File:Roadrunners groupphoto.jpg|800px|frameless|center]]
-<center>From left to right: Gwee Wei Ling, Tan Ming Kwang, Prof Kam Tin Seong, Tan Zhi Chong (Vincent)</center>
-<br>
-[[File:Comments.png|900px|frameless|center]]
 <div style="text-align: center; direction: ltr; margin-left: 1em;"><font face="Avenir"><big>Feel free to leave any comments! :) </big></font></div>