Revision as of 12:18, 27 February 2020

Re-imagining Bus Transport Network in Singapore

Overview
Corn or Maize (as called in some countries) was first grown in ancient Central America. Corn has become a staple in many parts of the world, providing not only substances that we fill our belly with, but also act as the raw ingredient for corn ethanol, animal feed etc. The United States accounts for about 40% of production of corn in the world¹, which makes it the largest corn producer. The major portion of production is found in the Midwestern states, such as Illinois, Iowa, Nebraska and Minnesota – these states were grouped and eventually became known as the ‘Corn Belt’. The Corn Belt has about 96,000,000 acres of land just for corn production. The states that make up the Corn Belt were selected due to leveled land, fertile and highly organic soils².

Corn has been known to be able to grow in a wide range of climatic conditions, hence it would be a challenge to set precise conditions for corn production. However, there is still a limit of this wide window of conditions, such as corn is grown mostly in tropical latitudes, corn has a cold limit of 19°C, corn grows the best in warm temperatures between 21°C to 27°C and the growing season to grow hovers between 120 -180 days³. Hence, breeders have been experimenting with various types of corn hybrids, each of them specifically created to have high yield despite the environment it is planted in. Over the years, the farmers have been using trial and error method to identify the best hybrids to plant by planting each of these hybrids in different locations with different environmental factors; this process has been proven to be slow and not very effective⁴. This project aims to explore the meteorology and geographical factors that makes a corn, the a-maize-ing crop that we know today, which would benefit the corn breeder greatly.

Scope

The scope of the project is limited to the corn produced in USA, specifically from the Corn Belt regions. Due to time constraint, we will analyse at the ‘Environment’ level (aggregated) instead of ‘Hybrid’ level. Each Environment can be treated as a plantation, where each Environment has various numbers of hybrids being planted. All years are taken into consideration, with a focus on the individual growing season of each environment (around Mar/April to September). We will also limit the environmental factors to the following:

Precipitation
Exposure length to sun (Radiation)
Average Temperature
Location of where the hybrid is planted

Objective

The first objective of this project is to visualise our dataset:

Weather Data: Precipitation, Average Temperature,Length of (sun) Radiation over the years
Performance Data: Average Yield by State, by Plantation. However after skimming through our data, about 45% of our plantations are only used once. Hence we do not have enough data for us to do a time-series analysis. Instead, we will be doing cross-sectional analysis, where we will analyse and visualise our data year by year.

The second objective is to predict the yield of a plantation, given the soil conditions and topography of the plantation. The model that we are going to implement is Geo-weighted Regression (GWR) Model. This is will be further elaborated on in the Methodology section.

Data Source

This data from the 'Syngenta Crop Challenge 2019'. Click here to see the data.

Performance Data

This dataset is the main dataset: we have both the individual hybrid yield, as well as the average yield (of several hybrids) of a particular environment. We have 5,382 unique hybrids, with 579 unique environments. The table below shows the main variables that we will be using for our analysis:

Variable	Description
*YEAR*	Year grown
*HYBRID_ID*	Identifier for the tested hybrid
*ENV_ID*	Identifier for the tested location and year
*YIELD*	Yield of tested hybrid in tested location (quintiles/hectare)
*ENV_YIELD_MEAN*	Average Yield of tested location
*LAT, LONG*	Latitude and Longitude of tested location to nearest 0.1 degree
*ELEVATION*	Elevation of field at tested location
*PLANT_DATE*	Date the hybrid was planted
*HARVEST_DATE*	Date the hybrid was harvested
*CLAY, SILT, SAND, AWC, pH, etc*	Properties of soil at tested location

Weather Data

This dataset is the supporting dataset: this has all the environmental data for each environment for 365 days (each). There are three important things to note:

We will only use the days that corresponding each of the growing season of that environment, not the data for all 365 days.
Same location (one set of Long-Lat coordinates) may have various ENV_IDs. For example, level of precipitation at Location A on 8^th January 2013 and on 8^th January 2015 would be different, hence different ENV_ID despite being at the same location.
One ENV_ID correspond to One plantation. In other words, one plantation could have multiple ENV_ID, but never at the same year.

The table below shows the main variables that we will be using for our analysis:

Variable	Description
*ENV_ID*	Identifier for the tested location and year (same one as in Performance Data)
*DAY_NUM*	Day number within year of weather variables (365 days)
*DAYL*	Day length (seconds)
*PREC*	Precipitation (mm)
*SRAD*	Solar radiation (W/m²)
*TMAX*	Maximum temperature (degrees Celsius)
*TMIN*	Minimum temperature (degrees Celsius)

Visualisations

For our Weather Data, we will aggregate the environments into the states that they are in, and then we will be implementing geofacet time series. This would give an overview of how the weather is like over the years for that particular state. However, we know that Nature knows no boundary, hence we will plot Isohyetal (Preciptation) and Isothermal (Average Temperature) Maps to see how the distribution is like over the Corn Belt.

For our Performance Data, we will be implementing isoline graphs to visualise the yield by location, by plantation. This would give a good visualisation on which part of the corn belt has better yield.

Our prediction model would be done using GWmodel package to implement GWR model, which will be discussed in the following section. Our GWR model would estimate the relationships between our independent variable (yield) with our dependent variables (soil, Drought-resistant, Longitude, Latitide, Elevation etc), which eventually could be used by corn breeders in predicting the yield of a plantation, given a certain set of environmental factors.

Methodology: Geo-weighted Regression

Geo-weighted Regression (GWR) explores spatially varying relationships between the dependent and independent variables at location 𝑖, where 𝑢_𝑖 and 𝑣_𝑖 are the coordinates of 𝑖. As data are geographically weighted, nearer observations have more influence in determining the local set of regression coefficients. Any GWR model will return these 2 important parameters: Estimates and corresponding t-value. From the estimate value, we can see the correlation between that observation with the dependent variable. If the value is positive, means it is positively correlated, and vice versa. With the given t-values, we will convert to the corresponding p-values in R to see significance. The t-value is specific thing for a specific statistical test, whereas the p-value gives the statistical significance level. We take 5% significant level for our analysis.

There are two parameters that are important to GWR Model: Kernel and Bandwidth.

Kernels for GWR

File:Kernels GWR.png

There are 6 kernel functions available:

The global GWR model gives equal weightage to all observations, which is actually just a normal linear regression. The other kernel functions are used to determine the weightage of an observation by calculating the distance between 2 observations. Gaussian and Exponential kernels are continuous functions of the distance between 2 observations. The weightage will be maximum at a GW model calibrated point (d=0), and decrease accordingly to its function. These two functions give lesser weightage to observations beyond the bandwidth. Boxcar, Bisquare and Tricube kernels are discontinuous functions, giving zero weightage to observations greater than the bandwidth. Boxplot gives equal weightage to observations within the bandwidth, whereas Bisqaure and Tricube give decreasing weightage until the bandwidth.

We will build both Global and Local GWR models in this project.

Bandwidth

The bandwidth gives the range where the kernel will be applied on the each observation. The smaller the bandwidth, the smaller the range.

Tools & Packages

These are some of the tools that we will be using:

Package	Useage of Package
*lubridate*	Cleaning Data
*tidyverse*	Cleaning Data
*ggplot2*	General Graph Plots
*geofacet*	Visualising both Weather Data and Performance Data
*tmap, gstat, sp, sf, rgdal, rgeos:raster*	Visualising isolines for both Weather Data and Performance Data
*GWmodel*	GWR model for Performance Data
*shiny*	Dashboard Design
*shinydashboard*	Dashboard Design
*shinythemes*	Dashboard Design

Reference

The image for the banner was tken from https://iegvu.agribusinessintelligence.informa.com/CO215920/South-Africa-corn-planting-plummets.
Kernel image is from Gollini, I., Lu, B., Charlton, M., Brunsdon, C., & Harris, P. (2013). GWmodel: an R package for exploring spatial heterogeneity using geographically weighted models. arXiv preprint arXiv:1306.0413.

[1] Olson, R. A., & Sander, D. H. (1988). Corn production. Corn and corn improvement, (cornandcornimpr), 639-686.
[2] Smith, C. W. (2004). Corn: origin, history, technology, and production (Vol. 4). John Wiley & Sons.
[3] Shaw, R. H. (1988). Climate requirement. Corn and corn improvement, (cornandcornimpr), 609-638.
[4] https://www.ideaconnection.com/syngenta-crop-challenge/challenge.php

@@ Line 1: / Line 1: @@
-<div style=background:#FFCC99 border:#FFCC99>
+<!----------- Main Header ------------>
-[[File:Bus meme.jpg|400px]]
+<div style="background:#E4EBF0; padding:24px; text-align:center;">
-<font size = 6; color="#000000">            Re-imagining Bus Transport Network in Singapore </font>
+<font size = 8; color="#176585"><span style="font-family:Segoe UI Light;">Re-imagining Bus Transport Network in Singapore</span></font>
 </div>
-<!--NAV BAR -->
-{|style= background-color:"#FFCC99"; width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" border-width="10"|
-| style="font-family:’Helvetica’, font-size:100%; solid #0433ff; background:#FFCC99; text-align:center;" width="16%"  |
-;
-[[ISSS608_Group07_Proposal|<font color="#000000">Proposal</font>]]
-| style="font-family:’Helvetica’, font-size:100%; solid #0433ff; background:#FFDEAD; text-align:center;" width="16%" |
+<!------------ Navigation bar ------------>
-;
+{|style="background-color:#ffffff;" width="100%" |
-[[ISSS608_Group07_Poster| <font color="#000000">Poster</font>]]
-| style="font-family:’Helvetica’, font-size:100%; solid #0433ff; background:#FFDEAD; text-align:center;" width="16%" |
+| style="font-family:Segoe UI Semibold; font-size:100%; text-align:center;border-bottom:solid #176585" width="16.6%" |
-;
+[[Group08 Proposal | <font size = 4; color="#176585">Proposal</font>]]
-[[ISSS608_Group07_Application| <font color="#000000">Application</font>]]
+| style="font-family:Segoe UI Light; font-size:100%; text-align:center;border-bottom:solid #BDD1DE" width="16.6%" |
-| style="font-family:’Helvetica’, font-size:100%; solid #0433ff; background:#FFDEAD; text-align:center;" width="16%" |
+[[Group08 Poster | <font size = 4; color="#4180AB">Poster</font>]]
-;
-[[ISSS608_Group07_Report| <font color="#000000">Report</font>]]
+| style="font-family:Segoe UI Light; font-size:100%; text-align:center;border-bottom:solid #BDD1DE" width="16.6%" |
+[[Group08 Application| <font size = 4; color="#4180AB">Application</font>]]
+| style="font-family:Segoe UI Light; font-size:100%; text-align:center;border-bottom:solid #BDD1DE" width="16.6%" |
+[[Group08 Report| <font size = 4; color="#4180AB">Report</font>]]
+| style="font-family:Segoe UI Light; font-size:100%; text-align:center;border-bottom:solid #BDD1DE" width="16.6%" |
+[[Project Groups| <font size = 4; color="#4180AB">Back to Main ↗</font>]]
-| style="font-family:’Helvetica’, font-size:100%; solid #0433ff; background:#FFDEAD; text-align:center;" width="16%" |
-;
-[[Project_Groups| <font color="#000000">Return to All Projects</font>]]
 |}
+<!------------ End of navi bar ------------>
-<br/>

Difference between revisions of "Group08 proposal"

Revision as of 12:18, 27 February 2020

Contents

Scope

Objective

Data Source

Performance Data

Weather Data

Visualisations

Methodology: Geo-weighted Regression

Kernels for GWR

Bandwidth

Tools & Packages

Reference

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools