Difference between revisions of "Group08 proposal"

From ISSS608-Visual Analytics and Applications
Jump to navigation Jump to search
(Nav bar)
Line 1: Line 1:
<div style=background:#FFCC99 border:#FFCC99>
+
<!----------- Main Header ------------>
[[File:Bus meme.jpg|400px]]
+
<div style="background:#E4EBF0; padding:24px; text-align:center;">  
<font size = 6; color="#000000">           Re-imagining Bus Transport Network in Singapore </font>
+
<font size = 8; color="#176585"><span style="font-family:Segoe UI Light;">Re-imagining Bus Transport Network in Singapore</span></font>
 
</div>
 
</div>
  
<!--NAV BAR -->
 
{|style= background-color:"#FFCC99"; width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" border-width="10"|
 
 
| style="font-family:’Helvetica’, font-size:100%; solid #0433ff; background:#FFCC99; text-align:center;" width="16%"  | 
 
;
 
[[ISSS608_Group07_Proposal|<font color="#000000">Proposal</font>]]
 
  
| style="font-family:’Helvetica’, font-size:100%; solid #0433ff; background:#FFDEAD; text-align:center;" width="16%" |
+
<!------------ Navigation bar ------------>
;
+
{|style="background-color:#ffffff;" width="100%" |  
[[ISSS608_Group07_Poster| <font color="#000000">Poster</font>]]
 
  
| style="font-family:’Helvetica’, font-size:100%; solid #0433ff; background:#FFDEAD; text-align:center;" width="16%" |   
+
| style="font-family:Segoe UI Semibold; font-size:100%; text-align:center;border-bottom:solid #176585" width="16.6%" | 
;  
+
[[Group08 Proposal | <font size = 4; color="#176585">Proposal</font>]]
[[ISSS608_Group07_Application| <font color="#000000">Application</font>]]  
+
 
+
| style="font-family:Segoe UI Light; font-size:100%; text-align:center;border-bottom:solid #BDD1DE" width="16.6%" |   
| style="font-family:’Helvetica’, font-size:100%; solid #0433ff; background:#FFDEAD; text-align:center;" width="16%" |   
+
[[Group08 Poster | <font size = 4; color="#4180AB">Poster</font>]]
;
+
 
[[ISSS608_Group07_Report| <font color="#000000">Report</font>]]  
+
| style="font-family:Segoe UI Light; font-size:100%; text-align:center;border-bottom:solid #BDD1DE" width="16.6%" | 
 +
[[Group08 Application| <font size = 4; color="#4180AB">Application</font>]]
 +
 
 +
| style="font-family:Segoe UI Light; font-size:100%; text-align:center;border-bottom:solid #BDD1DE" width="16.6%" | 
 +
[[Group08 Report| <font size = 4; color="#4180AB">Report</font>]]
 +
 
 +
| style="font-family:Segoe UI Light; font-size:100%; text-align:center;border-bottom:solid #BDD1DE" width="16.6%" |   
 +
[[Project Groups| <font size = 4; color="#4180AB">Back to Main ↗</font>]]
  
| style="font-family:’Helvetica’, font-size:100%; solid #0433ff; background:#FFDEAD; text-align:center;" width="16%" | 
 
;
 
[[Project_Groups| <font color="#000000">Return to All Projects</font>]]
 
 
 
|}
 
|}
 
+
<!------------ End of navi bar ------------>  
<br/>
 
  
  

Revision as of 12:18, 27 February 2020

Re-imagining Bus Transport Network in Singapore


Proposal

Poster

Application

Report

Back to Main ↗


Overview
Corn or Maize (as called in some countries) was first grown in ancient Central America. Corn has become a staple in many parts of the world, providing not only substances that we fill our belly with, but also act as the raw ingredient for corn ethanol, animal feed etc. The United States accounts for about 40% of production of corn in the world1, which makes it the largest corn producer. The major portion of production is found in the Midwestern states, such as Illinois, Iowa, Nebraska and Minnesota – these states were grouped and eventually became known as the ‘Corn Belt’. The Corn Belt has about 96,000,000 acres of land just for corn production. The states that make up the Corn Belt were selected due to leveled land, fertile and highly organic soils2.

Corn has been known to be able to grow in a wide range of climatic conditions, hence it would be a challenge to set precise conditions for corn production. However, there is still a limit of this wide window of conditions, such as corn is grown mostly in tropical latitudes, corn has a cold limit of 19°C, corn grows the best in warm temperatures between 21°C to 27°C and the growing season to grow hovers between 120 -180 days3. Hence, breeders have been experimenting with various types of corn hybrids, each of them specifically created to have high yield despite the environment it is planted in. Over the years, the farmers have been using trial and error method to identify the best hybrids to plant by planting each of these hybrids in different locations with different environmental factors; this process has been proven to be slow and not very effective4. This project aims to explore the meteorology and geographical factors that makes a corn, the a-maize-ing crop that we know today, which would benefit the corn breeder greatly.

Scope

The scope of the project is limited to the corn produced in USA, specifically from the Corn Belt regions. Due to time constraint, we will analyse at the ‘Environment’ level (aggregated) instead of ‘Hybrid’ level. Each Environment can be treated as a plantation, where each Environment has various numbers of hybrids being planted. All years are taken into consideration, with a focus on the individual growing season of each environment (around Mar/April to September). We will also limit the environmental factors to the following:

  1. Precipitation
  2. Exposure length to sun (Radiation)
  3. Average Temperature
  4. Location of where the hybrid is planted

Objective

The first objective of this project is to visualise our dataset:

  1. Weather Data: Precipitation, Average Temperature,Length of (sun) Radiation over the years
  2. Performance Data: Average Yield by State, by Plantation. However after skimming through our data, about 45% of our plantations are only used once. Hence we do not have enough data for us to do a time-series analysis. Instead, we will be doing cross-sectional analysis, where we will analyse and visualise our data year by year.

The second objective is to predict the yield of a plantation, given the soil conditions and topography of the plantation. The model that we are going to implement is Geo-weighted Regression (GWR) Model. This is will be further elaborated on in the Methodology section.

Data Source

This data from the 'Syngenta Crop Challenge 2019'. Click here to see the data.

Performance Data

This dataset is the main dataset: we have both the individual hybrid yield, as well as the average yield (of several hybrids) of a particular environment. We have 5,382 unique hybrids, with 579 unique environments. The table below shows the main variables that we will be using for our analysis:

Variable Description
YEAR Year grown
HYBRID_ID Identifier for the tested hybrid
ENV_ID Identifier for the tested location and year
YIELD Yield of tested hybrid in tested location (quintiles/hectare)
ENV_YIELD_MEAN Average Yield of tested location
LAT, LONG Latitude and Longitude of tested location to nearest 0.1 degree
ELEVATION Elevation of field at tested location
PLANT_DATE Date the hybrid was planted
HARVEST_DATE Date the hybrid was harvested
CLAY, SILT, SAND, AWC, pH, etc Properties of soil at tested location

Weather Data

This dataset is the supporting dataset: this has all the environmental data for each environment for 365 days (each). There are three important things to note:

  1. We will only use the days that corresponding each of the growing season of that environment, not the data for all 365 days.
  2. Same location (one set of Long-Lat coordinates) may have various ENV_IDs. For example, level of precipitation at Location A on 8th January 2013 and on 8th January 2015 would be different, hence different ENV_ID despite being at the same location.
  3. One ENV_ID correspond to One plantation. In other words, one plantation could have multiple ENV_ID, but never at the same year.

The table below shows the main variables that we will be using for our analysis:

Variable Description
ENV_ID Identifier for the tested location and year (same one as in Performance Data)
DAY_NUM Day number within year of weather variables (365 days)
DAYL Day length (seconds)
PREC Precipitation (mm)
SRAD Solar radiation (W/m2)
TMAX Maximum temperature (degrees Celsius)
TMIN Minimum temperature (degrees Celsius)

Visualisations

For our Weather Data, we will aggregate the environments into the states that they are in, and then we will be implementing geofacet time series. This would give an overview of how the weather is like over the years for that particular state. However, we know that Nature knows no boundary, hence we will plot Isohyetal (Preciptation) and Isothermal (Average Temperature) Maps to see how the distribution is like over the Corn Belt.

For our Performance Data, we will be implementing isoline graphs to visualise the yield by location, by plantation. This would give a good visualisation on which part of the corn belt has better yield.

Our prediction model would be done using GWmodel package to implement GWR model, which will be discussed in the following section. Our GWR model would estimate the relationships between our independent variable (yield) with our dependent variables (soil, Drought-resistant, Longitude, Latitide, Elevation etc), which eventually could be used by corn breeders in predicting the yield of a plantation, given a certain set of environmental factors.

Methodology: Geo-weighted Regression

Geo-weighted Regression (GWR) explores spatially varying relationships between the dependent and independent variables at location 𝑖, where 𝑢𝑖 and 𝑣𝑖 are the coordinates of 𝑖. As data are geographically weighted, nearer observations have more influence in determining the local set of regression coefficients. Any GWR model will return these 2 important parameters: Estimates and corresponding t-value. From the estimate value, we can see the correlation between that observation with the dependent variable. If the value is positive, means it is positively correlated, and vice versa. With the given t-values, we will convert to the corresponding p-values in R to see significance. The t-value is specific thing for a specific statistical test, whereas the p-value gives the statistical significance level. We take 5% significant level for our analysis.

There are two parameters that are important to GWR Model: Kernel and Bandwidth.

Kernels for GWR

There are 6 kernel functions available:

The global GWR model gives equal weightage to all observations, which is actually just a normal linear regression. The other kernel functions are used to determine the weightage of an observation by calculating the distance between 2 observations. Gaussian and Exponential kernels are continuous functions of the distance between 2 observations. The weightage will be maximum at a GW model calibrated point (d=0), and decrease accordingly to its function. These two functions give lesser weightage to observations beyond the bandwidth. Boxcar, Bisquare and Tricube kernels are discontinuous functions, giving zero weightage to observations greater than the bandwidth. Boxplot gives equal weightage to observations within the bandwidth, whereas Bisqaure and Tricube give decreasing weightage until the bandwidth.

We will build both Global and Local GWR models in this project.

Bandwidth

The bandwidth gives the range where the kernel will be applied on the each observation. The smaller the bandwidth, the smaller the range.

Tools & Packages

These are some of the tools that we will be using:

Package Useage of Package
lubridate Cleaning Data
tidyverse Cleaning Data
ggplot2 General Graph Plots
geofacet Visualising both Weather Data and Performance Data
tmap, gstat, sp, sf, rgdal, rgeos:raster Visualising isolines for both Weather Data and Performance Data
GWmodel GWR model for Performance Data
shiny Dashboard Design
shinydashboard Dashboard Design
shinythemes Dashboard Design

Reference

The image for the banner was tken from https://iegvu.agribusinessintelligence.informa.com/CO215920/South-Africa-corn-planting-plummets.
Kernel image is from Gollini, I., Lu, B., Charlton, M., Brunsdon, C., & Harris, P. (2013). GWmodel: an R package for exploring spatial heterogeneity using geographically weighted models. arXiv preprint arXiv:1306.0413.

[1] Olson, R. A., & Sander, D. H. (1988). Corn production. Corn and corn improvement, (cornandcornimpr), 639-686.
[2] Smith, C. W. (2004). Corn: origin, history, technology, and production (Vol. 4). John Wiley & Sons.
[3] Shaw, R. H. (1988). Climate requirement. Corn and corn improvement, (cornandcornimpr), 609-638.
[4] https://www.ideaconnection.com/syngenta-crop-challenge/challenge.php