ISSS608 Group07 Proposal
|
|
|
|
Overview
Corn or Maize (as called in some countries) was first grown in ancient Central America. Corn has become a staple in many parts of the world, providing not only substances that we fill our belly with, but also act as the raw ingredient for corn ethanol, animal feed etc. The United States accounts for about 40% of production of corn in the world1, which makes it the largest corn producer. The major portion of production is found in the Midwestern states, such as Illinois, Iowa, Nebraska and Minnesota – these states were grouped and eventually became known as the ‘Corn Belt’. The Corn Belt has about 96,000,000 acres of land just for corn production. The states that make up the Corn Belt were selected due to leveled land, fertile and highly organic soils2.
Corn has been known to be able to grow in a wide range of climatic conditions, hence it would be a challenge to set precise conditions for corn production. However, there is still a limit of this wide window of conditions, such as corn is grown mostly in tropical latitudes, corn has a cold limit of 19°C, corn grows the best in warm temperatures between 21°C to 27°C and the growing season to grow hovers between 120 -180 days3. Hence, breeders have been experimenting with various types of corn hybrids, each of them specifically created to have high yield despite the environment it is planted in. Over the years, the farmers have been using trial and error method to identify the best hybrids to plant by planting each of these hybrids in different locations with different environmental factors; this process has been proven to be slow and not very effective4. This project aims to explore the meteorology and geographical factors that makes a corn, the a-maize-ing crop that we know today, which would benefit the corn breeder greatly.
Contents
Scope
The scope of the project is limited to the corn produced in USA. We will use all the years provided in the dataset (2008 - 2017), but with a focus on only the months that fall within the growing season of corn for that particular environment. We will also limit the environmental factors to be the following:
- Precipitation
- Exposure length to sun
- Average Temperature
- Location of where the hybrid is planted
- Which year the hybrid was planted
Objective
The first objective of this project is to visualise our dataset:
- Weather Data: Precipitation, Average Temperature,Length of (sun) Radiation over the years
- Performance Data: Average Yield by State, by Plantation and finally by Hybrid. However after skimming through our data, about 45% of our hybrids are only planted once. Hence we do not have enough data for us to do a time-series analysis. Instead, we will be doing cross-sectional analysis, where we will analyse and visualise our data year by year.
The second objective is to predict the yield of a particular drought-resistant corn hybrid, given the soil conditions and topography of the plantation. The model that we are going to implement is Geo-weighted Regression (GWR) Model. However to make this prediction, we first need to classify the hybrids to whether a particular corn hybrid is drought resistant or not. This is will be further elaborated on in the Methodology section.To determine whether a hybrid is drought resistant or not, we compare the individual average precipitation of each hybrid with the global average precipitation of all hybrids.
Data Source
This data from the 'Syngenta Crop Challenge 2019'. Click here to see the data.
Performance Data
This dataset is the main dataset: we have both the individual hybrid yield, as well as the average yield (of several hybrids) of a particular environment. We have 5,382 unique hybrids, with 579 unique environments. The table below shows the main variables that we will be using for our analysis:
Variable | Description |
---|---|
YEAR | Year grown |
HYBRID_ID | Identifier for the tested hybrid |
ENV_ID | Identifier for the tested location and year |
YIELD | Yield of tested hybrid in tested location (quintiles/hectare) |
ENV_YIELD_MEAN | Average Yield of tested location |
LAT, LONG | Latitude and Longitude of tested location to nearest 0.1 degree |
ELEVATION | Elevation of field at tested location |
PLANT_DATE | Date the hybrid was planted |
HARVEST_DATE | Date the hybrid was harvested |
CLAY, SILT, SAND, AWC, pH, etc | Properties of soil at tested location |
Weather Data
This dataset is the supporting dataset: this has all the environmental data for each environment for 365 days (each). There are three important things to note:
- We will only use the days that corresponding each of the growing season of that environment, not the data for all 365 days.
- Same location (one set of Long-Lat coordinates) may have various ENV_IDs. For example, level of precipitation at Location A on 8th January 2013 and on 8th January 2015 would be different, hence different ENV_ID despite being at the same location.
- One ENV_ID correspond to One plantation. In other words, one plantation could have multiple ENV_ID, but never at the same year.
The table below shows the main variables that we will be using for our analysis:
Variable | Description |
---|---|
ENV_ID | Identifier for the tested location and year (same one as in Performance Data) |
DAY_NUM | Day number within year of weather variables (365 days) |
DAYL | Day length (seconds) |
PREC | Precipitation (mm) |
SRAD | Solar radiation (W/m2) |
TMAX | Maximum temperature (degrees Celsius) |
TMIN | Minimum temperature (degrees Celsius) |
Visualisations
For our weather data, we will aggregate the environments into the states that they are in, and then we will be implementing geofacet time series. This would give an overview of how the weather is like over the years for that particular state. We could zoom down to the various plantation to take a closer look at the weather data.
For our performance data, we will be implementing choropleth and isoline graphs to visualise the yield by state, by plantation and perhaps even by hybrid. This would give a good visualisation on which part of the corn belt has better yield.
Our prediction model would be done using spgwr package to implement GWR model. GWR model is basically a normal regression model which takes the longitude and latitude as variables as well. This model will be discussed more in depth in our research paper. Our GWR model would estimate the relationships between our independent variable (yield) with our dependent variables (soil, Drought-resistant, Longitude, Latitide, Elevation etc), which would help the corn breeders in predicting the yield a drought-resistant crop, given a certain set of environmental factors.
Tools & Packages
These are some of the tools that we will be using:
Package | Useage of Package |
---|---|
lubridate | Cleaning Data |
tidyverse | Cleaning Data |
ggplot2 | General Graph Plots |
geofacet | Visualising both Weather Data and Performance Data |
spgwr | GWR model for Performance Data |
choroplethr | Visualising Performance Data |
contoureR | Visualising Performance Data |
shiny | Dashboard Design |
shinydashboard | Dashboard Design |
shinythemes | Dashboard Design |
Reference
The image for the banner was tken from https://iegvu.agribusinessintelligence.informa.com/CO215920/South-Africa-corn-planting-plummets.
[1] Olson, R. A., & Sander, D. H. (1988). Corn production. Corn and corn improvement, (cornandcornimpr), 639-686.
[2] Smith, C. W. (2004). Corn: origin, history, technology, and production (Vol. 4). John Wiley & Sons.
[3] Shaw, R. H. (1988). Climate requirement. Corn and corn improvement, (cornandcornimpr), 609-638.
[4] https://www.ideaconnection.com/syngenta-crop-challenge/challenge.php