Washburn 2019

Published online October 3, 2019
RESEARCH
Predictive Breeding for Maize: Making Use of

Molecular Phenotypes, Machine Learning, and
Physiological Crop Models
Jacob D. Washburn,* Merritt B. Burch, and José A. Valdes Franco
J.D. Washburn, M.B. Burch, and J.A.V. Franco, Institute for Genomic
ABSTRACT Diversity, Cornell Univ., Ithaca, NY 14853; M.B. Burch and J.A. Valdes
Maize (Zea mays L.) has been a focus of scientific Franco, Section of Plant Breeding and Genetics, Ithaca, NY 14850.
research and breeding for over a century. It is also Received 8 Apr. 2019. Accepted 6 July 2019. *Corresponding author
one of the most economically important crops in ( jdw297@cornell.edu). Assigned to Associate Editor Carlos Messina.
the world, with a value of approximately US$50
Abbreviations: GBLUP, genomic best linear unbiased prediction;
billion per year in the United States alone. Addi-
GÉ, genotype ´ environment; GÉ´M, genotype ´ environment
tionally, maize has long been the model species
´ management; GFBLUP, genomic feature best linear unbiased
of choice for the study and exploitation of hybrid
prediction; GWAS, genome-wide association study; IBD, identity-by-
vigor, and it continues to be one of the world’s
descent; LD, linkage disequilibrium; MAS, marker-assisted selection;
most efficient converters of photosynthetic
PHG, Practical Haplotype Graph; QTL, quantitative trait locus/loci;
energy into starch. This review discusses the
SNP, single nucleotide polymorphism; WCP, whole-crop physiological.
history and future of maize predictive breeding
in the context of both genotype centric methods,
and those focusing on genotype ´ environment
´ management interactions. Current prediction
challenges are highlighted, as well as important
M aize or corn (Zea mays L.) has been subject to human selec-
tion and improvement for over 10,000 yr. It has also been a
key organism for scientific study since at least the time of Charles
advances in technology, methods, datasets,
Darwin (Darwin, 1876). Throughout this effort, predicting the
interdisciplinary collaborations, and scientific
culture that will enable accelerated progress
performance of a given maize plant based on the phenotypes of
in predictive maize (and other crop species) its parents has been important. Early farmers, for example, saved
breeding for years to come. seeds from their best plants under the expectation that those seeds
would result in better plants in the next generation. Darwin, on
the other hand, was intrigued by crosses of maize, and indeed
between many other animal and plant species, which often resulted
in hybrids that were more vigorous than self-pollinated plants
(Darwin, 1868), contrary to what would have been predicted at
the time. Improvements in our ability to predict the outcomes
of maize breeding (and breeding of other crops) have increased
steadily over the past century and a half but have also transitioned
through distinct stages marked by technical, statistical, and meth-
odological advancements (Ramstein et al., 2018). In this review,
we discuss the history of predictive breeding in maize, recently
developed prediction strategies and methods, as well as prom-
ising advances and future directions. This discussion is divided
broadly into genomic prediction methods and advancements, and
Published in Crop Sci. 59:1–15 (2019).

doi: 10.2135/cropsci2019.04.0222
© 2019 The Author(s). Re-use requires permission from the publisher.
crop science www.crops.org 1

current and future opportunities to enhance the predic- of thousands of corn ears was no more… With a tractor
tion of phenotypes based on genomic, environmental, and shouldering the brutal burdens corn picking seemed
management data. almost a lark… Suddenly [young farm boys] could shape
our own fate… We could get an education; maybe even a
A BRIEF HISTORY OF MAIZE GENETICS profession” (Vietmeyer, 2011).
AND PREDICTIVE BREEDING As significant as the innovations of the early to mid-
Numerous morphological, genetic, and archaeological 1900s had been, predictive breeding was still limited
studies indicate that modern maize originated from the by a lack of understanding the genetic architecture of
ancestors of today’s teosinte [Zea mays L., Zea diploperennis complex traits, and the ability to do large-scale geno-
H.H. Iltis, Zea perennis (Hitchc.) Reeves & Mangelsd., typing. In the 1970s and the following decades, these
and Zea luxurians (Durieu & Asch.) R.M. Bird] (Beadle, phenotyping and genotyping tools came into existence
1939; Piperno and Flannery, 2001; Matsuoka et al., 2002; and enabled statistical association studies between molec-
Jaenicke-Després et al., 2003; Doebley, 2004; Hufford et ular marker genotypes and trait phenotypes. Genetic
al., 2013). Some of the earliest selections and improve- mapping, genome-wide association study (GWAS), and
ments to these ancient teosintes came in the form of marker-assisted selection (MAS) became feasible and
domestication traits, which were controlled by a few genes enabled maize breeders to better understand the genetic
with large effects and could be easily selected phenotypi- underpinnings of traits and to select for them more effi-
cally. The teosinte branched 1 locus is a well-known example ciently. Marker-assisted selection allowed breeders to
where the modern maize allele represses lateral branching, select traits early in the breeding cycle, and the creation
contributing to the compact single-stalk architecture of of detailed genetic maps allowed scientists to better
maize plants (Doebley et al., 1995). Other alleles, such localize and understand the genetic architecture of traits
as teosinte glume architecture 1, have dramatically changed (Tanksley et al., 1982; Helentjaris et al., 1986; Davis
ear morphology by exposing the kernels (Dorweiler et et al., 1999). However, for traits like yield, which are
al., 1993). Still, other domestication-related genes control controlled by many genes or quantitative trait loci (QTL)
important traits like seed shattering, spikelet pairing, and with small effects, GWAS still faced the combined chal-
ear rank (Stitzer and Ross-Ibarra, 2018). lenges of multiple testing correction (resulting in more
A second distinct advancement in maize breeding and stringent significance cutoffs as the number of loci tested
prediction was the introduction of Mendelian genetics increases) and decreased detection power due to the
and statistically informed formal breeding strategies. genetic complexity of traits and the small effect sizes of
These included the formalization of genetic theory, and individual QTL.
the advent of tools like the concept of genetic gain, the
breeder’s equation, linear mixed modeling, and the use of
pedigrees as a tool for selection (Lush, 1937; Henderson et GENOMIC PREDICTION
al., 1959; Ramstein et al., 2018). Each of these advance- The Origin and Implementation of
ments gave breeders a greater ability to predict the Genomic Prediction
outcome of a given cross and more efficiently allocate In the late 1990s to early 2000s, decreasing sequencing
resources to make genetic gain more likely. Of unique costs and the availability of informative markers led to a
importance in maize improvement was the development major shift in the scale of quantitative genetics. Where
of heterotic groups and hybrid vigor ( Jones, 1917; Duvick, previous experiments were largely focused on small
2001). Although the exact mechanisms of hybrid vigor numbers of qualitative traits, the study of complex, poly-
were not understood at the time (and remain the subject of genic traits became increasingly possible. Scientists had
much debate today), the results of its application in maize hypothesized that MAS alone could be used to predict
breeding were monumental (Duvick, 2001; Lippman and the phenotype of an individual (Haley and Visscher,
Zamir, 2007; Mezmouk and Ross-Ibarra, 2014; Washburn 1998). Shortly thereafter, thousands of markers were fitted
and Birchler, 2014). Concurrent with these genetic simultaneously to estimate the genetic value of indi-
advances was a revolution in agronomy and mechanized viduals to predict their phenotypes (Meuwissen et al.,
agriculture in the early to mid-1900s. This included the 2001). This strategy of predicting genome-wide effects
broad use of fertilizers and the introduction of modern to gain accurate breeding values for individuals allowed
fossil-fuel-powered farm equipment (Ertel, 2001; Erisman for the acceleration of genetic gain per breeding cycle in
et al., 2008). Norman Borlaug, who would later trans- previously difficult or inaccessible traits with low herita-
form agriculture in his own way, described the changes bility, complex genetic architectures, and/or difficult to
that hybrid corn, fertilizer, and mechanization made to measure phenotypes (Heffner et al., 2009). This method-
his 1930s teenage life as follows: “The two-month horror ology is commonly known today as genomic prediction
of harvesting, harvesting, husking, and heaving hundreds or genomic selection. Many statistical variations on
2 www.crops.org crop science

genomic prediction models exist, the most common being The development of accurate, affordable, and repro-
genomic best linear unbiased prediction (GBLUP), ridge ducible genotyping platforms has been a main driver in
regression, or Bayesian methods (Whittaker et al., 2000; the adoption of genomic prediction approaches. In most
Gianola et al., 2009; Wang et al., 2018a). The details of cases, to keep costs manageable for a breeding program,
these methods and their relative utilities for different genotyping involves analyzing a reduced representation of
applications have been discussed broadly by other authors the genome (Table 1). The choice of genotyping tech-
(see citations above). nology is dependent on the genetic architecture of the
In typical genomic prediction experiments, two trait(s) of interest, and the underlying population struc-
forms of information on the breeding population are often ture of the target population: simple traits and populations
collected: (i) high density, genome wide markers, and with extended LD patterns require fewer markers, whereas
(ii) phenotypic data on the traits of interest. In the absence traits with complex genetic architecture and populations
of genomic marker data, population pedigree data can also with low LD structure require larger numbers of markers
be used. A training population, containing both genotypic for successful prediction.
and phenotypic data from the desired breeding popula- To achieve the desired level of affordability, genotyping
tion, is used to create a model of genotypic effects and to technologies have been developed to achieve maximal
estimate the breeding values for individuals ( Jannink et multiplexing—that is, the inclusion of as many samples
al., 2010). Models generated using the training population per single experiment (DNA sequencing run) as possible.
are then used to predict estimated breeding values for new This push towards higher throughput multiplexing strate-
individuals for which phenotypes are desired but only gies has driven the development of new DNA sequencing
genotypic information exists (Meuwissen et al., 2001; chemistries and reduced representation approaches. A side
Poland et al., 2012). Prediction accuracy relies heavily effect of this multiplexing, however, is that some samples
on the relationship between the training population and might be underrepresented or missing in the genotyping
the individuals being predicted, as well as the number of libraries. To alleviate this problem, several bioinformatics
lines in the training population (Crossa et al., 2017). The approaches have been developed that allow for the impu-
power of a given trained model is also highly dependent tation of missing genotypes.
on trait heritability, population size, population structure, Imputation approaches can exploit known pedigree
and marker density (Liu et al., 2018). relationships (Swarts et al., 2014), the diversity found
in identity-by-descent (IBD) regions (Browning et al.,
Genotyping Strategies for Genomic Prediction 2018), or the relative measure of LD (Money et al., 2015)
The number of genetic markers required to make accurate in the population. For imputation, maize represents a
genomic predictions depends on the structure and particular challenge (Bukowski et al., 2018), specifically
complexity of the population and trait(s) of interest. In when looking at diverse landrace material (Swarts et al.,
some cases, as few as 300 single nucleotide polymorphisms 2014; Romero Navarro et al., 2017), where regions of IBD
(SNPs) in maize are sufficient to make predictions on traits might be hard to identify, LD decays particularly rapidly
with low complexity, but the use of tens of thousands or and heterozygosity is high. To improve on this, some
more SNP variants is more common (Zhang et al., 2017). current and future genotyping and imputation efforts look
In any case, genomic prediction experiments typically to leverage the known haplotypic diversity of a species by
make use of sparse sequence data (e.g., genotyping-by- capturing the entire gene set in a representative pange-
sequencing [GBS], chip-array, etc.), which, thanks to the nome (Tettelin et al., 2005; Golicz et al., 2016). Such a
usually extended linkage disequilibrium (LD) patterns in representation would also eliminate the inherent bias of
breeding populations, is sufficient to identify extended mapping all genotypes to a reference genome that is more
haplotypes that might include causal loci (Xu et al., 2017b). closely related to some genotypes than to others. Current
This association, however beneficial for prediction across efforts by the maize community are generating high-
related populations, can also hinder the implementation of quality genome assemblies for many diverse maize lines
genomic prediction in other cases. For example, the ability (Hufford, 2019), as well as developing a genome graph
to perform prediction over more distant populations will approach for representing all known maize haplotypes
be diminished, as LD patterns might not be shared, and the (Bradbury et al., 2018). This particular graph approach,
actual causal alleles might not be associated with any geno- named the Practical Haplotype Graph (PHG), defines a
typed markers. Increasing marker density in these cases can set of evolutionarily conserved regions, to which assem-
increase prediction accuracy (Liu et al., 2018). Saturation of blies from diverse maize accessions can be anchored. This
markers increases the probability that a marker will be in collection of haplotypes is a representation of the maize
LD with a causal variant, but adding more markers to the pangenome and can be used with sparse and inexpensive
same LD block does not provide additional useful informa- genotyping data to construct high-quality imputations
tion and increases the complexity of the model. across populations for genomic prediction. Although

Table 1. Reduced genome representation approaches commonly used for maize genotyping and genomic selection. Table
modified from Rasheed et al. (2017).
Approach No. of target sites Technology Reference
Array 3000–50,000 Illumina BeadChip Ganal et al. (2011), Rousselle et al. (2015)
Array 50,000–600,000 Affymetrix Axiom Unterseer et al. (2014), Xu et al. (2017a)
Exome capture Variable Roche SeqCap EZ Design https://sequencing.roche.com/en-us/products-solutions/
by-category/target-enrichment/shareddesigns.html
GBS 50–300,000 Genotype-by-sequencing Elshire et al. (2011)
DArT-seq 50,000 Diversity Arrays Technology (DArT) https://www.diversityarrays.com/
sequencing
rAmpSeq 1–2000 Repeat amplification sequencing Buckler et al. (2016)
rhAmpSeq 5000 RNase H2 amplification sequencing https://www.idtdna.com/pages/products/next-generation-
sequencing/amplicon-sequencing/custom-rhampseq-panels
this effort is ongoing in maize, human geneticists have molecular phenotypes can be quantitatively measured and
reported improvements in read mapping sensitivity and are associated with multiple, distinct layers of biological
structural variant calling based on graph representa- information. Importantly, they also represent interme-
tions of the human pangenome (Rakocevic et al., 2019). diate states between the effects of genomic variants and the
Hopefully, the PHG and other novel genotyping and final, complex, whole-plant phenotype traits commonly
imputation approaches will reduce the costs of genotyping measured in breeding programs (i.e., yield, biomass, plant
even further, allowing for increased use and accuracy in height, etc.). Recent developments allow for assaying many
genomic prediction. molecular phenotypes in a high-throughput manner, using
In recent years, genomic prediction has become tens to hundreds of samples measured for tens to hundreds
commonplace in state-of-the-art maize breeding programs of thousands of individual observations from a single
(see examples in Eathington et al., 2007; Crossa et al., 2014, experiment (Kremling et al., 2018; Seifert et al., 2018;
2017; Gaffney et al., 2015; Shikha et al., 2017; Cerrudo et Yobi and Angelovici, 2018). It is the combination of these
al., 2018; Heslot et al., 2015; Ramstein et al., 2018), but its two characteristics—(i) representing a more direct effect
use remains limited in smaller scale breeding, particularly in of the genotype and a less complex trait than a whole-plant
developing countries. This is due to many factors, including phenotype, and (ii) being high-throughput—that makes
genotyping costs, education about when and how to apply molecular phenotypes in genomic prediction valuable.
genomic prediction, lack of computational expertise, and These characteristics also lower the burden of multiple
logistical challenges related to genotyping, analyzing, and testing in the context of QTL mapping and GWAS studies
making decisions within the short timeframe required for by analyzing, for example, tens of thousands of expres-
breeding. Ongoing work to disseminate genomic predic- sion levels, instead of hundreds of thousands to millions
tion to more breeding programs around the world, both of SNPs. Indeed, recent approaches combining genetic
in maize and other crops, will likely be critical to yield markers, with transcriptome or metabolome data for asso-
improvements and increased food availability (Bevan et al., ciation studies, have been shown to increase the power to
2017; Rasheed et al., 2017). detect causal genes (Mancuso et al., 2017; Kremling et al.,
Although current genomic prediction methods have 2019), and to better predict hybrid performance (de Abreu
revolutionized maize breeding, they still have significant e Lima et al., 2017).
weaknesses (as discussed above), and they often fail to The effective implementation of molecular pheno-
capture important amounts of heritable variation (Manolio types in genomic prediction is contingent on several
et al., 2009; Xiao et al., 2017). Genomic prediction also important factors and assumptions. First, since molec-
commonly fails to make accurate predictions across envi- ular phenotype expression can vary dramatically across
ronments, and to disentangle genotype ´ environment tissue types and time points, both need to be relevant
(GÉ) interactions, which are often critical to predicting to the trait being predicted. Second, molecular pheno-
yield and other economically important traits (Boer et al., types cannot always be assumed to be causative of the
2007; Zhao and Xu, 2012; Technow et al., 2015; Cooper predicted phenotype, vs. the predicted phenotype causing
et al., 2016). the molecular trait. Although that assumption is tempting
due to molecular phenotypes being biochemically closer
Improving Genomic Prediction through the to a whole-plant trait than a SNP, molecular phenotypes
Use of Molecular Phenotypes can only be considered as associative unless tested under
Recent developments in genomic prediction have begun a Mendelian randomization approach (Davey Smith and
to analyze molecular phenotypes such as RNA, proteins, Ebrahim, 2003), because pleiotropic effects are possible
or metabolites, produced within an individual. These and difficult to disprove.

The Role of Transcriptomic and Metabolomic Incorporating information on the metabolome during
Molecular Phenotypes in Genomic Prediction the early stages of a plant’s life cycle is another method to
Using transcriptomic data (RNA abundance) within possibly improve genomic prediction. Although metabo-
genomic prediction has the ability to bridge the dynamic lites have high turnover rates, they also have extensive
activity of gene expression to the static profiles of molec- diversity and have been shown to be associated with
ular markers. Doing so can potentially differentiate closely numerous quantitative traits in plants (Carreno-Quin-
related individuals who share highly similar genotypes and tero et al., 2012; Hill et al., 2015). In rice (Oryza sativa
capture epistatic interactions between lines (Westhues et al., L.), metabolic-based predictions nearly doubled predic-
2017; Zenke-Philippi et al., 2017). In practice, however, the tive ability compared with genomic markers (Xu et al.,
use of transcriptomic data alone or in combination with 2016). In maize, metabolic profiles have been shown to
genotypic data has seen mixed and often trait-dependent improve predictive ability over genomic data for grain
results. In some cases, transcriptomic data alone outper- yield and grain weight, but for numerous other traits,
formed genotypic data for highly complex traits like dry metabolic information alone decreases predictive abilities
matter yield, whereas in other cases, no differences, or even when compared with genomic markers (Riedelsheimer et
reductions in prediction accuracy, were found between al., 2012; Guo et al., 2016; Westhues et al., 2017; Schrag
the genotypic and transcriptomic models (Westhues et al., et al., 2018). Within maize, combining genotypic and
2017). The combination of both genomic and transcrip- metabolic prediction seems to be a much more powerful
tomic data as predictors in genomic prediction models approach than metabolites by themselves (Guo et al., 2016;
also show trait-specific responses compared with standard Westhues et al., 2017). The abundance of a given metabo-
genotypic models (Guo et al., 2016). In one study, models lite may be controlled by complex biochemical pathways
using both transcriptomic and genotypic data as predic- and interactions between multiple proteins, genes, and
tors outperformed genetic markers for traits like dry matter modules of regulatory machinery. The incorporation of
yield and protein content while performing more poorly metabolites may be capturing networks of activity that
than, or equivalent to, genetic markers for predicting sugars, are not captured by genomic and/or transcriptomic data.
fat, and fiber content (Westhues et al., 2017). Similarly, in When used all together, metabolites, genetic markers, and
combined genomic and transcriptomic models, numerous transcriptomic data have higher predictive abilities than
traits related to plant architecture and flowering time saw standard genomic prediction for numerous traits (Guo
small improvements in predictive abilities, whereas some et al., 2016; Westhues et al., 2017). This increase might
traits related to yield such as kernel weight and cob diameter be attributed to the complementary novel information
saw decreases in prediction accuracy (Guo et al., 2016). The content overlap between genomic, metabolic, and tran-
combination of small RNAs, messenger RNAs, genotype, scriptomic data. The information content overlap between
and pedigree information into a single model also shows genomic and metabolomic data is likely much lower than
accuracy increases for some traits; however, most combina- the overlap between genomic and transcriptomic data,
tion models show little to no increase in their predictive therefore allowing combined genomic, metabolomic,
ability compared with solely genotypic data (Schrag et al., and transcriptomic models to outperform most single or
2018). The inability of the transcriptome to show mean- paired predictor models.
ingful changes over standard genomic prediction models The addition of metabolite information can provide
may be hindered by the availability of adequate genomic insight into how different metabolites influence a single
and transcriptomic data (Zenke-Philippi et al., 2016, 2017). moment of a plant’s life cycle. Sampling these moments,
In addition, which tissues and developmental time points are similar to sampling the transcriptome, can be very expen-
most important for predicting a given trait is not immedi- sive for breeding programs when compared with genomic
ately obvious, and resource limitations seldom allow for the marker and pedigree information. Additionally, the time
scale of data needed to test multiple tissues and time points points and/or tissues one should sample to most effectively
in a breeding program. Methods for imputing missing increase prediction accuracy are not always obvious. More
transcript profiles from genotype data have been developed efficient methods for sampling plants across the life cycle
(Gamazon et al., 2015). These methods have been shown and for processing, identifying, and quantifying metab-
to moderately improve prediction accuracy (Westhues et olites inexpensively could have profound effects on the
al., 2019), but further improvements in this area are possible application of both metabolite and transcriptomic data
by novel machine learning techniques on sequence data in predictive breeding. For example, physiological and
(Washburn et al., 2019). Reductions in the costs of RNA developmental information has been used to determine
sequencing, along with improved sampling strategies for the developmental time points most relevant to grain
large populations, should allow more thorough testing yield (Li et al., 2018b; Millet et al., 2019). These and
of the ability of transcriptomic data to improve genomic similar analyses could be used to prioritize sample collec-
prediction models (Kremling et al., 2018). tion. Whole-crop physiological (WCP) models (discussed

in more detail below) could also be used in this way or cattle, found that genetic information from these micro-
perhaps adapted to produce rank ordering of the most biomes, without any host genotypic markers, was able to
important samples to collect for prediction of a given trait. predict the traits equivalently, if not slightly better than
Additionally, methods for predicting or imputing metabo- standard genomic prediction models (Ross et al., 2013).
lism related traits from limited data should be explored. These microbial communities capture an additional layer
For both metabolomic and transcriptomic data, the of genetic information that is not found within host
expense and labor involved in generating the data often genomes and could assist in breeding for traits whose
outweighs the potential gains. However, if sampling and phenotypes are strongly influenced by these communities
imputation approaches can be made inexpensive enough, (Ross et al., 2013).
then including one or both of these data types in predic-
tive models could become commonplace. Machine Learning Methods in
Genomic Prediction
Functionality of Genomic-Derived Features Machine learning, as considered here, includes statistical
and Microbiomes in Genomic Prediction methods and algorithms in which the system “learns”
Variants on genomic prediction models that use publicly a function for predicting outputs from inputs based on
available gene ontology information have been explored the data itself, rather than the researcher providing the
for their ability to assess the biological importance of function. Machine learning methods for genomic predic-
genetic marker effects to improve prediction accuracy. In tion have been explored as both a complement to, and
GBLUP models, genetic marker effects are not differenti- a replacement for, traditional GBLUP. Many types of
ated based on the genomic features they reside within. machine learning methods can potentially be applied
An alternative version of GBLUP models called genomic to genomic prediction, but deep neural networks show
feature best linear unbiased prediction (GFBLUP), incor- particular promise and are the focus of the methods here
porates gene ontology metadata on each marker as a discussed. Any mathematical function can, in theory, be
covariate in a standard GBLUP model (Edwards et al., approximated by the right neural network. However,
2016). This method has shown promising results between training a network to accurately approximate the rela-
inbred lines of Drosophila and between cattle breeds (Fang tionship between genotypes and phenotypes (or anything
et al., 2017). These additional gene ontology terms serving else) requires showing it numerous datasets in which that
as predictors may be picking up on unique layers of trait relationship exists, and designing, training, and testing a
functionality potentially associated with gene regulatory network that can correctly identify that relationship. Addi-
networks or transcriptional modules. These methods, tionally, like many statistical techniques, neural networks
however, are still limited by the availability of accurate cannot inherently differentiate between correlation and
genomic annotations in regions of the genome affecting causation (Marcus, 2018).
the trait in interest (Edwards et al., 2016; Fang et al., 2017). One example of the application of machine learning
Gene ontology information within GFBLUP models has to genomic prediction is DeepGS, a deep learning based
the ability to increase the predictive ability compared with R package developed as a reduced marker assumption
GBLUP and can provide some biological insight into the nonlinear genomic prediction model (Ma et al., 2018).
genetic architecture of the trait under selection in addition DeepGS uses a multilayered convolutional neural network
to trait prediction (Edwards et al., 2016). to learn the association between genotypes and pheno-
While not a molecular phenotype of an individual, types in training populations and then predicts breeding
the microbiomes of mammals have been used to predict values within the test population. In tests on an Iranian
complex host phenotypes. The human microbiota has bread wheat landrace population, DeepGS was compa-
been shown to affect gut metabolic pathways (Li et al., rable with ridge regression best linear unbiased prediction
2008) and brain behavior in individuals with anxiety (RR-BLUP) and GBLUP in predicting grain length (Ma
(Foster and McVey Neufeld, 2013), whereas in plants, et al., 2018). Machine learning models have also been
the microbiome within the rhizosphere and leaves of applied to understanding GÉ interactions and their
maize inbreds is heritable and genotype specific (Peiffer impacts on predictive ability. In both single- and multi-
et al., 2013; Wallace et al., 2018; Walters et al., 2018) and trait models in maize and wheat (Triticum aestivum L.), deep
plays multiple roles in immune health, nutrient uptake, learning models performed better than standard GBLUP
and stress responses (Berendsen et al., 2012). Due to the when GÉ effects were removed (Montesinos-López et
impact of microbiomes on host phenotypic variation, the al., 2018). However, when GÉ effects were included,
use of this alternative, genetic information in genomic GBLUP performed better than deep learning methods. In
prediction may be insightful. A study using metagenomic general, current machine learning methods are not able to
profiles to predict the body mass index and inflammatory outperform standard genomic prediction. It remains to be
bowel disease in humans, as well as methane production in seen if fine tuning of these methods, either by carefully

incorporating molecular phenotypes, and/or using better WCP models to improve genomic prediction. In so doing,
dimension reduction methods to identify subsets of highly we also discuss the importance of weather, soil, and field
informative data, may allow them to outperform standard management data, and the inherent challenges in working
genomic prediction. across disparate disciplines such as soil science, agronomy,
physiology, and breeding. Understanding these challenges
Future Directions of Genomic Prediction will likely be crucial to any attempt at uniting genotype,
in Maize Breeding environment, and management into a single useful predic-
Much of predictive breeding suffers from high dimension- tive framework.
ality where there is more information on genetic markers
than there are observations (n < < p) within the popula- Introduction to Crop Physiological Models
tion (Crossa et al., 2017; Ramstein et al., 2018). This issue Whole-crop physiological models rely on plant physi-
has been addressed in genomic prediction using numerous ological measurements and calculations, as well as
forms of dimensionality reduction. Some examples environmental variables to predict field level phenotypes
include selecting SNPs that are more functionally relevant over time (Hammer et al., 2002, 2016; Wang et al., 2002;
according to machine learning (Li et al., 2018a; Ma et Messina et al., 2009; Soufizadeh et al., 2018). Each day,
al., 2018), and focusing on SNPs with high conserva- the model cycles through a variety of modules, each
tion as inferred by Genomic Evolutionary Rate Profiling determining aspects of the plant’s development for the
(GERP) scores (Rodgers-Melnick et al., 2015; Yang et al., day based on the weather data provided, calculations of
2017). The use of open chromatin regions has also been available water and nutrients in the soil, the plants leaf
suggested to reduce the complexity of the maize genome area and radiation use efficiency, and other factors. These
to the small, functionally accessible regions that account calculations feed together into predictions about the
for most phenotypic variation (Rodgers-Melnick et al., plant’s developmental stage, how much stress the plant is
2016). This reduction in the genomic space containing experiencing, and how much biomass the plant will gain
the most informative loci can address the problem of high on a given day (Fig. 1). Whole-crop physiological models
dimensionality. Ramstein et al. (2018) recently reviewed are typically constructed at a species level (i.e., a different
the n < < p dimensionality problem in great detail. They model for maize, sorghum [Sorghum bicolor (L.) Moench],
suggest that overcoming high dimensionality may be a wheat, etc.), but they also have built-in cultivar param-
critical step towards a new era in quantitative genetics. eters which allow tuning to fit different genotypes (or
In particular, they concluded that key improvements of cultivars) within a species. This tuning to a new cultivar
genomic prediction might come from high-throughput is often referred to as model calibration and requires
phenotyping (increasing n), the use of molecular pheno- measuring physiological variables (phenotypes) on a new
types and/or component traits (potentially simplifying cultivar in as many environments as possible. Some of
the genetic architecture), machine learning methodolo- these phenotypes can be entered into the model directly,
gies, and replacing genomic markers with high-quality whereas others are compared with the models predictions
haplotype data (e.g., using methods similar to the above- and used as feedback in an iterative tuning process. Once
mentioned PHG).
PREDICTING ACROSS GENOTYPE,

ENVIRONMENT, AND MANAGEMENT
Numerous approaches for predicting phenotypes from a
combination of genotype, environmental, and manage-
ment (sometimes included in environment) have been
developed over the years (Fisher and Mackenzie, 1923; van
Eeuwijk et al., 1996; Burgueño et al., 2008, 2011; Thomas,
2010; Malosetti et al., 2013). Several recent approaches to
the problem include: (i) using environmental covariance
structures within a genomic prediction model ( Jarquín et
al., 2014), (ii) incorporating environments into genomic
models through environmental indices (Li et al., 2018b),
and (iii) incorporating whole crop physiological growth
models into genomic prediction (Technow et al., 2015;
Cooper et al., 2016). Each of these and other approaches
have strengths and weaknesses that could be the subject
of an entire review. Here, we will focus on the use of Fig. 1. Diagram representation of a crop model (Soufizadeh et al.,
2018).

calibrated, models can accurately predict cultivar specific for large datasets, computational resources, etc.). Neces-
phenotypes across widely varying locations (even future sary in solving each of the challenges is a critical mass of
climates) given soil and weather data from those locations investigators from multiple disciplines working together.
(Hammer et al., 2014, 2016).
Whole-crop physiological models have been applied Major Challenges
to crop risk assessment, management decision making, and Culture
even plant breeding decisions for several decades (Messina A surprising, yet significant, challenge to the development
et al., 2009; Hammer et al., 2014, 2016; Technow et al., of integrated WCP and quantitative genetic methods is
2015; Wallach et al., 2018). However, the models incorpo- the educational, linguistic, and practical divides between
rate little to no genetic data, and calibrating them for new geneticists/breeders (those whose education is genetics
genotypes is costly, time consuming, and usually (though centric) and agronomists/physiologists/soil scientists
not necessarily) requires multiple seasons. Additionally, (those whose education is more physiology or chemistry
the estimation of model parameters can suffer from issues centric). Although both groups of scientists work with
such as expressivity, equifinality, and instability (Lamsal plant-based agriculture, the terminologies they use and
et al., 2018). These challenges make it difficult to apply their approaches to the same problems are very different.
WCP models directly to novel breeding materials or even Additionally, the way these subdisciplines of plant science
elite cultivars and make it nearly impossible to calibrate set up and execute experiments and the desired outcomes
models for the large numbers of genotypes inherent in are not always compatible. Not surprisingly, genetics
breeding programs. For these reasons, the impact of WCP and breeding experiments often maximize the number
models on predictive breeding in maize (and other crops) of distinct genotypes being studied. This is particularly
has been limited. true in quantitative and population genetics where large
numbers of genotypes are required to meet the statistical
Integrating Physiological Models requirements in the desired analysis. Genetic experiments
with Genomic Prediction are also often designed with the goal of minimizing
The integration of quantitative genetic models with environmental variation or simply controlling for it statis-
WCP models has long been proposed as a solution to tically. Agronomy, physiology, and soil science projects
the challenge of genotype ´ environment ´ manage- often take the opposite approach in that genetic variation
ment (GÉ´M) prediction (Cooper and Hammer, 1996; is minimized. These experiments may involve a single
White and Hoogenboom, 1996; Hoogenboom et al., genotype examined under multiple sets of conditions. In
1997; Cooper and Podlich, 2002; Chapman et al., 2003; some cases, many environments may be critical to these
Reymond et al., 2003; Wang et al., 2003; Yin et al., 2005; experimental designs, whereas a single well-controlled
Messina et al., 2006; Hammer et al., 2006, 2016; Cooper environment is preferred in others. The disparate goals
et al., 2014, 2016; Parent and Tardieu, 2014; Technow between the subdisciplines make the generation of cross-
et al., 2015; Baldazzi et al., 2016). Various strategies for discipline datasets challenging (see the section on data
how this might be accomplished have been proposed, collection below), but they also hamper collaboration and
but in general, none of the strategies have been fully discussion across the disciplines.
explored, implemented, and tested. One exception is a The quantitative genetics and plant breeding view-
recent attempt made by scientists at DuPont Pioneer (now point, for example, often focuses on end points and static
Corteva). In this case, approximate Bayesian computa- relationships (e.g., the relationship between plant yield and
tion and Bayesian hierarchical models were used to unite a given cultivar’s genetic marker data). Whole-crop physi-
a standard genomic prediction model with a very simpli- ological model practitioners, on the other hand, tend to
fied WCP model (Technow et al., 2015; Cooper et al., view the world in terms of time-dependent relationships
2016; Messina et al., 2018). These methods outperformed and processes that vary with the plant’s developmental
the standard genomic prediction model in the context of stage. These differences in viewpoint are not particularly
training in one environment and predicting in another surprising when one considers that a given plant’s genetics
while showing similar results to the standard model for are in fact stable across the entire growing season, whereas
within environment predictions. the physiological and developmental processes the plant is
In general, the development of integrated quantita- undergoing change with time.
tive genetics and WCP models faces several significant Some simple yet pervasive examples of differing
community-level challenges. Some of these challenges viewpoints can be illustrated by the different seman-
are cultural or historical (the two types of models were tics of common agricultural research words and phrases.
developed by different communities and for different In physiological modeling, the word environment is
purposes over decades), whereas others are more specific thought of and measured in complex, time-dependent
to the challenging nature of the problem itself (i.e., need variables, which can be extremely granular and specific

(e.g., temperature variations across hours, days, weeks, or biomass) into component traits (like stress tolerance,
at a specific location). In quantitative genetics, on the seed number, and weight, etc.). Complex traits are likely
other hand, environments are generally represented and controlled by many genes and tied to multiple physiolog-
thought of in a reduced or summative form. Environ- ical processes. Breaking them down into simpler, better
ments may be represented as indices, for example, or more understood traits should allow more accurate and intuitive
often they are simply thought of as distinct experiments or modeling and more straightforward genetics. Component
replicates in a statistical framework and characterized by traits like flowering time are relatively simple to measure,
plant performance differences (e.g., the yield of cultivar but still time consuming on a large scale. Others, like
x is more variable across environments than is the yield radiation use efficiency, require specialized equipment
of cultivar z). These differences in viewpoints and termi- and/or highly controlled conditions for accurate measure-
nologies extend further into the meaning behind ideas ments. Recent advances in high-throughput phenotyping
like GÉ´M interactions. Breeders and geneticists take a allow many traits to be measured quickly and accurately
plant-centric viewpoint of GÉ and rarely use (or think using unmanned aerial vehicles, field robots, or sensors
in terms of ) the GÉ´M terminology. From the plant’s mounted on farm vehicles and driven through the field
point of view, all influences external to it are environ- (Andrade-Sanchez et al., 2014; Shi et al., 2016; DeChant
mental. Agronomists, physiologists, and soil scientists, et al., 2017; Crain et al., 2018; Wang et al., 2018b). These
on the other hand, often take the farmer’s perspective high-throughput phenotyping systems may even be able
and think in terms of GÉ´M. Here, environment to estimate some of the more difficult physiological traits
includes only factors that are out of the farmer’s control, using machine learning algorithms, carefully designed
and management includes factors that can be influenced training experiments, and/or correlated traits (Rosati
through agronomic intervention. et al., 2004; Zhang et al., 2015; Ramstein et al., 2018;
Twohey et al., 2018). Such advances could enable WCP
Data Collection model calibration and validation at population scales.
As mentioned above, the goals, design, implementation,
and analysis of experiments varies widely across genetics, Modeling and Computation
breeding, agronomy, physiology, and soil science. This Whole-crop physiological and quantitative genetics
presents a major challenge to the integration of quantita- models are built on statistical frameworks and philoso-
tive genetics with WCP models because currently available phies distinct from each other. Whole-crop physiological
datasets only include subsets of the types of data needed. models rely heavily on physiological calculations made
For example, there are many high-quality datasets avail- from both small- and large-scale experiments and approx-
able on the effects of variations in soil type and chemistry imations calculated through differential equations. They
on plant phenotypes, but these datasets are mostly limited calculate daily intermediate values that result in the final
to a few genotypes. Conversely, multiple large-scale trait values (yield, biomass, height, etc.). Cultivar tuning is
datasets are available for maize association and diver- often performed manually and based on limited amounts
sity panels with thousands of genotypes included across of training and testing data. Genomic prediction models,
multiple environments, but the matching soil data and on the other hand, rely almost entirely on maximum
many physiologically important variables are not avail- likelihood or Bayesian regression frameworks, without
able. Even when attempts are made to collect extra data intermediate phenotypes, and are entirely focused on
(i.e., more genotypes in a soil experiment, or soil data in a predicting final trait values. When compared with WCP
genetics experiment), the financial costs and/or the exper- methods, genomic prediction models are usually relatively
tise to collect the proper data are too prohibitive. Table 2 simple with very few inputs and outputs (single pheno-
lists the types of data that might be considered minimal type, genomic, and/or pedigree relationships). Publication
for running both genomic prediction and WCP models standards are also different between the model types, with
given current separate implementations (based primarily genomic prediction models undergoing extensive leave-
on data used by the Agricultural Production Systems one-out or k-folds testing (the product is a model that
sIMulator, or APSIM) (Littleboy et al., 1999; Dalgliesh will be recalibrated for use on different populations),
and Foale, 2005; Cresswell et al., 2009; Archontoulis et and WCP models generally being tested and calibrated
al., 2012). Of course, the relative importance of each of across many environments (the product is a calibrated
these variables within the context of a combined model, model). An important point here, which relates to the
and if any of them can be imputed without significant loss earlier discussion on cultural differences, is that breeders
of fidelity, need to be tested experimentally. and geneticists are trained with a statistical viewpoint
On the other hand, one phenotyping concept used that does not allow for the use of testing data to improve
in both quantitative genetics and physiological modeling the model in any way. Using the same data (or insights
is the idea of breaking complex phenotypes (like yield gained from it) to both improve and test the model is

considered scientifically unacceptable. This viewpoint is be validated in multiple ways including both biological
critical because the statistical models used in breeding and sensibility and k-folds or leave-one-out testing methods
genetics (i.e., genomic prediction) are generally nonmech- that test in both genotypes and environments never previ-
anistic by definition and can therefore not be validated ously seen by the model.
biologically. Whole-crop physiological models, on the Both WCP and genomic prediction models require
other hand, are calibrated with the perspective that funda- substantial computational resources and many iterations
mental biological, chemical, or physiological mechanisms (e.g., iterative optimization algorithms for maximum-
are represented within the model, and that the model likelihood-based models such as GBLUP, or Markov chain
should be validated, at least in part, by its mechanisms and Monte Carlo sampling for Bayesian regression models).
results fitting within the realm of well-researched scien- Both can be run on a standard desktop computer for very
tific understanding. Combined models will likely need to simple datasets, but running larger datasets becomes too
Table 2. Minimal phenotypes, weather, soil, and management data required for running a typical crop growth model. Based on
the requirements of the Agricultural Production Systems sIMulator (APSIM).
Measurement Description or formula
Plant phenotypes
Planting date Date seeds were planted
Flowering/anthesis date Date when 50% of plants have 50% of their anthers exposed
Harvest date Date when grain was harvested
Grain yield Weight of grain, and grain moisture content at harvest
Grain no. per ear Average number of kernels per ear of corn
Grain weight Average weight of a single kernel
Total leaf number Total number of leaves produced through plant’s life cycle
Largest leaf area The area of the largest leaf (generally the ear leaf)
Biomass (at least for checks) Aboveground total plant biomass (excluding grain)
Soil/water/residue properties (by soil layer)†
Bulk density (BD) How dense or porous the soil is. BD = dry soil weight/soil volume
Gravimetric water content (GWC) Percentage water in soil. GWC = [(wet weight − dry weight)/dry weight − container weight] ´ 100
Drained upper limit (DUL) Water held by soil after drainage. DUL = GWC ´ BD
Saturated water content (SAT) Maximum water soil can hold. SAT = [1 − (BD/2.65)] − e, e = 0.03 (clay soil) to 0.07 (sandy soil)
Lower limit 15 (LL15) Soil water content at 15 bar (1.5 MPa) vacuum pressure
Air dry (AD) AD = LL15 ´ A, A = 0.5 to 1 depending on soil depth. See Cresswell et al. (2009)
Soil texture Relative proportion of sand, silt, and clay within the soil.
U Water evaporation potential from bare soil. See Littleboy et al. (1999)
Conna Soil water evaporation over time. See Littleboy et al. (1999)
Soil pH pH of the soil
Soil and root C/N ratios Ratio of C to N separately for soil and roots
Soil organic carbon (OC) Organic C in soil separated into labile and inert pools
Initial water Water in the soil profile at beginning of simulation
Initial N N in the soil profile at beginning of simulation
Surface residue amount Residual surface organic matter present at beginning of simulation
Surface residue C/N ratio Ratio of C to N in surface residue
Agronomic/management details
Irrigation (dates and quantities) Amount of water applied to the field through irrigation
Fertilizer (dates, types, quantities) Amount of fertilizer applied to the field
Tillage (date, type, depth) Tillage and/or cultivation applied to the field
Row and plant spacing Spacing between rows of plants and plants in a row
Planting depth Depth to which seeds were planted in the soil
Harvesting equipment and scheme Equipment used for harvesting and how harvest was carried out
Field layout (row and column) Layout of the field. How were different cultivars or species spread across the field spatially?
Weather recording (daily measurements)
Radiation Average incident shortwave radiation over the day
Max. temperature Daily maximum temperature
Min. temperature Daily minimum temperature
Precipitation Total precipitation over the day
Vapor pressure Average water vapor partial pressure over the day
Daylength Time from dawn until dusk
† Soil descriptions and calculations summarized from Archontoulis et al. (2012) and Dalgliesh and Foale, (2005).

resource intensive and requires the use of high-performance models that can predict and explain the performance of new
computing clusters. Parallelization is possible for many use genotypes in new environments would allow unprecedented
cases of both methods, and in some cases, data structure gains in maize breeding efficiency, and the development of
analyses have been used to efficiently organize computa- new maize cultivars that can survive and thrive in current
tional runs and better utilize available resources (Lamsal et and future climates.
al., 2017). Any combination of WCP and genomic predic-
tion models into a common framework will most certainly Conflict of Interest
require high-performance computing, as well as careful The authors declare that there is no conflict of interest.
software engineering to reduce the resources needed
for computation. Even then, being able to gather all of Acknowledgments
the needed data and run the models fast enough for use This material is based upon work supported by the US National
in a breeding program may be challenging. As discussed Science Foundation Postdoctoral Research Fellowship in Biol-
above, the Messina et al. (2018) implementation of genomic ogy under Grant 1710618 (to J.D. Washburn). Additional sup-
port comes from the USDA-ARS.
prediction combined with WCP was able to run within a
reasonable timeframe due to the use of a simplified WCP
model. The effects of these or any simplifications on model References
accuracy remain unknown at this point, but future research Andrade-Sanchez, P., M.A. Gore, J.T. Heun, K.R. Thorp, A. Eliza-
into the value (in terms of speed and accuracy) of simplifying bete Carmo-Silva, et al. 2014. Development and evaluation of a
field-based high-throughput phenotyping platform. Funct. Plant
WCP and genomic prediction models in a combined frame- Biol. 41:68–79. doi:10.1071/FP13126
work will be critical to designing the most cost-effective Archontoulis, S., F. Miguez, and K. Moore. 2012. APSIM manual for
and useful modeling schemes. The application of machine MS students. Iowa State Univ., Ames, IA.
learning, ensemble models (combination of multiple models Baldazzi, V., N. Bertin, M. Génard, H. Gautier, E. Desnoues, and B.
together), and other statistical methods that are flexible to Quilot-Turion. 2016. Challenges in integrating genetic control
in plant and crop models. In: X. Yin and C.P. Struik, editors,
data input types and designed for modeling nonlinear prop-
Crop systems biology: Narrowing the gaps between crop mod-
erties should also be useful for incorporating GÉ´M into elling and genetics. Springer Int. Publ., Cham, Switzerland. p.
predictive models. These approaches are beginning to be 1–31. doi:10.1007/978-3-319-20562-5_1
applied to the field, but their full potential remains relatively Beadle, G.W. 1939. Teosinte and the origin of maize. J. Hered.
unexplored (Montesinos-López et al., 2018). 30:245–247. doi:10.1093/oxfordjournals.jhered.a104728
Berendsen, R.L., C.M.J. Pieterse, and P.A.H.M. Bakker. 2012. The
rhizosphere microbiome and plant health. Trends Plant Sci.
CONCLUSIONS 17:478–486. doi:10.1016/j.tplants.2012.04.001
Maize breeding began many centuries ago with early Bevan, M.W., C. Uauy, B.B.H. Wulff, J. Zhou, K. Krasileva, M.D.
farmers saving seeds from their best plants for use in the next Clark. 2017. Genomic innovation for crop improvement. Nature
generation. Today, large and highly efficient maize breeding 543:346–354. doi:10.1038/nature22011
programs carry on that tradition, but with numerous molec- Boer, M.P., D. Wright, L. Feng, D.W. Podlich, L. Luo, M. Cooper,
and F.A. van Eeuwijk. 2007. A mixed-model quantitative trait loci
ular, statistical, computational, and technological tools.
(QTL) analysis for multiple-environment trial data using environ-
Genomic prediction, in particular, has become a critical tool mental covariables for QTL-by-environment interactions, with
for large-scale maize breeding programs around the world an example in maize. Genetics 177:1801–1813. doi:10.1534/genet-
and has the potential to revolutionize smaller-scale public, ics.107.071068
private, government, and nongovernment breeding efforts Bradbury, P., T. Casstevens, D. Ilut, L. Johnson, Z. Miller, R. Punna,
in maize and other species. Current challenges and opportu- et al. 2018. Inferring genotypes from skim sequence using a
graph-based approach: The Practical Haplotype Graph. Bit-
nities in predictive breeding include: (i) better measurement bucket. https://bitbucket.org/bucklerlab/practicalhaplotype-
and incorporation of high-throughput phenotypes, molec- graph/wiki/Home (accessed 28 Mar. 2019).
ular phenotypes, and environmental data into predictive Browning, B.L., Y. Zhou, and S.R. Browning. 2018. A one-penny
models, (ii) reducing the costs and increasing the availability imputed genome from next-generation reference panels. Am. J.
of genotyping, phenotyping, and predictive modeling, (iii) Hum. Genet. 103:338–348. doi:10.1016/j.ajhg.2018.07.015
Buckler, E.S., D.C. Ilut, X. Wang, T. Kretzschmar, M. Gore, and S.E.
finding new and better ways to overcome the n < < p dimen-
Mitchell. 2016. rAmpSeq: Using repetitive sequences for robust
sionality problem (the first and second points should help genotyping. bioRxiv 096628. doi:10.1101/096628
with this), and (iv) increasing collaboration and communi- Bukowski, R., X. Guo, Y. Lu, C. Zou, B. He, Z. Rong, et al. 2018.
cation across disciplines. Machine learning methods show Construction of the third-generation Zea mays haplotype map.
promise for overcoming some of these issues by allowing Gigascience 7:gix134. doi:10.1093/gigascience/gix134
predictive models that are more flexible to data formats and Burgueño, J., J. Crossa, P.L. Cornelius, and R.-C. Yang. 2008.
Using factor analytic models for joining environments and
can better capture nonlinear relationships. The use of crop genotypes without crossover genotype ´ environment
growth models together with genomic data also shows great interaction. Crop Sci. 48:1291–1305. doi:10.2135/crop-
promise for cross-environment prediction. High-accuracy sci2007.11.0632

Burgueño, J., J. Crossa, J.M. Cotes, F.S. Vicente, and B. Das. de Abreu e Lima, F., M. Westhues, Á. Cuadros-Inostroza, L. Will-
2011. Prediction assessment of linear mixed models for mul- mitzer, A.E. Melchinger, and Z. Nikoloski. 2017. Metabolic
tienvironment trials. Crop Sci. 51:944–954. doi:10.2135/crop- robustness in young roots underpins a predictive model of maize
sci2010.07.0403 hybrid performance in the field. Plant J. 90:319–329. doi:10.1111/
Carreno-Quintero, N., A. Acharjee, C. Maliepaard, C.W.B. Bachem, tpj.13495
R. Mumm, H. Bouwmeester, et al. 2012. Untargeted metabolic DeChant, C., T. Wiesner-Hanks, S. Chen, E.L. Stewart, J. Yosin-
quantitative trait loci analyses reveal a relationship between ski, M.A. Gore, et al. 2017. Automated identification of northern
primary metabolism and potato tuber quality. Plant Physiol. leaf blight-infected maize plants from field imagery using deep
158:1306–1318. doi:10.1104/pp.111.188441 learning. Phytopathology 107:1426–1432. doi:10.1094/PHYTO-
Cerrudo, D., S. Cao, Y. Yuan, C. Martinez, E.A. Suarez, R. Babu, et 11-16-0417-R
al. 2018. Genomic selection outperforms marker assisted selec- Doebley, J. 2004. The genetics of maize evolution. Annu. Rev. Genet.
tion for grain yield and physiological traits in a maize doubled 38:37–59. doi:10.1146/annurev.genet.38.072902.092425
haploid population across water treatments. Front. Plant Sci. Doebley, J., A. Stec, and C. Gustus. 1995. teosinte branched1 and the
9:366. doi:10.3389/fpls.2018.00366 origin of maize: Evidence for epistasis and the evolution of domi-
Chapman, S., M. Cooper, D. Podlich, and G. Hammer. 2003. Eval- nance. Genetics 141:333–346.
uating plant breeding strategies by simulating gene action and Dorweiler, J., A. Stec, J. Kermicle, and J. Doebley. 1993. Teosinte
dryland environment effects. Agron. J. 95:99–113. doi:10.2134/ glume architecture 1: A genetic locus controlling a key step
agronj2003.0099 in maize evolution. Science 262:233–235. doi:10.1126/sci-
Cooper, M., and G. Hammer. 1996. Plant adaptation and crop ence.262.5131.233
improvement. CAB Int., Wallingford, UK. Duvick, D.N. 2001. Biotechnology in the 1930s: The development of
Cooper, M., C.D. Messina, D. Podlich, L.R. Totir, A. Baumgarten, hybrid maize. Nat. Rev. Genet. 2:69–74. doi:10.1038/35047587
N.J. Hausmann, et al. 2014. Predicting the future of plant breed- Eathington, S.R., T.M. Crosbie, M.D. Edwards, R.S. Reiter, and J.K.
ing: Complementing empirical evaluation with genetic predic- Bull. 2007. Molecular markers in a commercial breeding program.
tion. Crop Pasture Sci. 65:311–336. doi:10.1071/CP14007 Crop Sci. 47:S-154–S-163. doi:10.2135/cropsci2007.04.0015IPBS
Cooper, M., and D.W. Podlich. 2002. The E(NK) model: Extend- Edwards, S.M., I.F. Sørensen, P. Sarup, T.F.C. Mackay, and P.
ing the NK model to incorporate gene-by-environment inter- Sørensen. 2016. Genomic prediction for quantitative traits is
actions and epistasis for diploid genomes. Complexity 7:31–47. improved by mapping variants to gene ontology categories in
doi:10.1002/cplx.10044 Drosophila melanogaster. Genetics 203:1871–1883. doi:10.1534/
Cooper, M., F. Technow, C. Messina, C. Gho, and L.R. Totir. genetics.116.187161
2016. Use of crop growth models with whole-genome predic- Elshire, R.J., J.C. Glaubitz, Q. Sun, J.A. Poland, K. Kawamoto, et
tion: Application to a maize multienvironment trial. Crop Sci. al. 2011. A robust, simple genotyping-by-sequencing (GBS)
56:2141–2156. doi:10.2135/cropsci2015.08.0512 approach for high diversity species. PLoS One 6:e19379.
Crain, J., S. Mondal, J. Rutkoski, R.P. Singh, and J. Poland. 2018. doi:10.1371/journal.pone.0019379
Combining high-throughput phenotyping and genomic infor- Erisman, J.W., M.A. Sutton, J. Galloway, Z. Klimont, and W. Win-
mation to increase prediction and selection accuracy in wheat iwarter. 2008. How a century of ammonia synthesis changed the
breeding. Plant Genome 11:170043. doi:10.3835/plantgen- world. Nat. Geosci. 1:636–639. doi:10.1038/ngeo325
ome2017.05.0043 Ertel, P. 2001. American tractor. Voyageur Press, Beverly, MA.
Cresswell, H.P., I.H. Hume, E. Wang, T.L. Nordblom, J.D. Finlayson, Fang, L., G. Sahana, P. Ma, G. Su, Y. Yu, S. Zhang, et al. 2017. Use of
et al. 2009. Catchment response to farm scale land use change. biological priors enhances understanding of genetic architecture
Commonwealth Sci. Ind. Res. Org., and New South Wales Dep. and genomic prediction of complex traits within and between
Ind. Invest., Sydney, NSW, Australia. dairy cattle breeds. BMC Genomics 18:604. doi:10.1186/s12864-
Crossa, J., P. Pérez, J. Hickey, J. Burgueño, L. Ornella, J. Cerón-Rojas, 017-4004-z
et al. 2014. Genomic prediction in CIMMYT maize and wheat Fisher, R.A., and W.A. Mackenzie. 1923. Studies in crop variation. II.
breeding programs. Heredity 112:48–60. doi:10.1038/hdy.2013.16 The manurial response of different potato varieties. J. Agric. Sci.
Crossa, J., P. Pérez-Rodríguez, J. Cuevas, O. Montesinos-López, D. 13:311–320. doi:10.1017/S0021859600003592
Jarquín, G. de los Campos, et al. 2017. Genomic selection in plant Foster, J.A., and K.-A. McVey Neufeld. 2013. Gut-brain axis: How
breeding: Methods, models, and perspectives. Trends Plant Sci. the microbiome influences anxiety and depression. Trends Neu-
22:961–975. doi:10.1016/j.tplants.2017.08.011 rosci. 36:305–312. doi:10.1016/j.tins.2013.01.005
Dalgliesh, N.P., and M.A. Foale. 2005. Soil matters. Agric. Prod. Syst. Gaffney, J., J. Schussler, C. Löffler, W. Cai, S. Paszkiewicz, C. Messina,
Res. Unit, Toowoomba, QLD, Australia. et al. 2015. Industry-scale evaluation of maize hybrids selected
Darwin, C. 1876. The effects of cross and self fertilisation in the for increased yield in drought-stress conditions of the US Corn
vegetable kingdom. John Murray, London. doi:10.5962/bhl. Belt. Crop Sci. 55:1608–1618. doi:10.2135/cropsci2014.09.0654
title.110800 Gamazon, E.R., H.E. Wheeler, K.P. Shah, S.V. Mozaffari, K.
Darwin, C.R. 1868. The variation of animals and plants under domes- Aquino-Michaels, R.J. Carroll, et al. 2015. A gene-based asso-
tication. John Murray, London. ciation method for mapping traits using reference transcriptome
Davey Smith, G., and S. Ebrahim. 2003. “Mendelian randomization”: data. Nat. Genet. 47:1091–1098. doi:10.1038/ng.3367
Can genetic epidemiology contribute to understanding envi- Ganal, M.W., G. Durstewitz, A. Polley, A. Bérard, E.S. Buckler, A.
ronmental determinants of disease?*. Int. J. Epidemiol. 32:1–22. Charcosset, et al. 2011. A large maize (Zea mays L.) SNP geno-
doi:10.1093/ije/dyg070 typing array: Development and germplasm genotyping, and
Davis, G.L., M.D. McMullen, C. Baysdorfer, T. Musket, D. Grant, genetic mapping to compare with the B73 reference genome.
M. Staebell, et al. 1999. A maize map standard with sequenced PLoS One 6:e28334. doi:10.1371/journal.pone.0028334
core markers, grass genome reference points and 932 expressed Gianola, D., G. de los Campos, W.G. Hill, E. Manfredi, and R. Fer-
sequence tagged sites (ESTs) in a 1736-locus map. Genetics nando. 2009. Additive genetic variability and the Bayesian alpha-
152:1137–1172. bet. Genetics 183:347–363. doi:10.1534/genetics.109.103952

Golicz, A.A., J. Batley, and D. Edwards. 2016. Towards plant pange- Jannink, J.L., A.J. Lorenz, and H. Iwata. 2010. Genomic selection in
nomics. Plant Biotechnol. J. 14:1099–1105. doi:10.1111/pbi.12499 plant breeding: From theory to practice. Brief. Funct. Genom.
Guo, Z., M.M. Magwire, C.J. Basten, Z. Xu, and D. Wang. 2016. 9:166–177. doi:10.1093/bfgp/elq001
Evaluation of the utility of gene expression and metabolic infor- Jarquín, D., J. Crossa, X. Lacaze, P. Du Cheyron, J. Daucourt, J. Lor-
mation for genomic prediction in maize. Theor. Appl. Genet. geou, et al. 2014. A reaction norm model for genomic selection
129:2413–2427. doi:10.1007/s00122-016-2780-5 using high-dimensional genomic and environmental data. Theor.
Haley, C.S., and P.M. Visscher. 1998. Strategies to utilize marker- Appl. Genet. 127:595–607. doi:10.1007/s00122-013-2243-1
quantitative trait loci associations. J. Dairy Sci. 81:85–97. Jones, D.F. 1917. Dominance of linked factors as a means of account-
doi:10.3168/jds.S0022-0302(98)70157-2 ing for heterosis. Genetics 2:466–479.
Hammer, G., M. Cooper, F. Tardieu, S. Welch, B. Walsh, F. van Eeu- Kremling, K.A.G., S.-Y. Chen, M.-H. Su, N.K. Lepak, M.C. Romay,
wijk, et al. 2006. Models for navigating biological complexity K.L. Swarts, et al. 2018. Dysregulation of expression corre-
in breeding improved crop plants. Trends Plant Sci. 11:587–593. lates with rare-allele burden and fitness loss in maize. Nature
doi:10.1016/j.tplants.2006.10.006 555:520–523. doi:10.1038/nature25966
Hammer, G., C. Messina, E. Oosterom, S. Chapman, V. Singh, A. Bor- Kremling, K.A.G., C.H. Diepenbrock, M.A. Gore, E.S. Buckler, and
rell, et al. 2016. Molecular breeding for complex adaptive traits: How N. Bandillo. 2019. Transcriptome-wide association supplements
integrating crop ecophysiology and modelling can enhance efficiency. genome-wide association in Zea mays. G3: Genes, Genomes,
In: X. Yin and C.P. Struik, editors, Crop systems biology: Narrowing Genet. doi:10.1534/g3.119.400549 (in press).
the gaps between crop modelling and genetics. Springer Int. Publ., Lamsal, A., S.M. Welch, J.W. Jones, K.J. Boote, A. Asebedo, J. Crain,
Cham, Switzerland. p. 147–162. doi:10.1007/978-3-319-20562-5_7 et al. 2017. Efficient crop model parameter estimation and site
Hammer, G.L., M.J. Kropff, T.R. Sinclair, and J.R. Porter. 2002. characterization using large breeding trial data sets. Agric. Syst.
Future contributions of crop modelling: From heuristics and sup- 157:170–184. doi:10.1016/j.agsy.2017.07.016
porting decision making to understanding genetic regulation and Lamsal, A., S.M. Welch, J.W. White, K.R. Thorp, and N.M. Bello.
aiding crop improvement. Eur. J. Agron. 18:15–31. doi:10.1016/ 2018. Estimating parametric phenotypes that determine anthe-
S1161-0301(02)00093-X sis date in Zea mays: Challenges in combining ecophysiological
Hammer, G.L., G. McLean, S. Chapman, B. Zheng, A. Doherty, models with genetics. PLoS One 13:e0195841. doi:10.1371/jour-
M.T. Harrelson, et al. 2014. Crop design for specific adaptation nal.pone.0195841
in variable dryland production environments. Crop Pasture Sci. Li, B., N. Zhang, Y.-G. Wang, A.W. George, A. Reverter, et al.
65:614–626. doi:10.1071/CP14088 2018a. Genomic Prediction of Breeding Values Using a Subset
Heffner, E.L., M.E. Sorrells, and J.-L. Jannink. 2009. Genomic selec- of SNPs Identified by Three Machine Learning Methods. Front.
tion for crop improvement. Crop Sci. 49:1–12. doi:10.2135/crop- Genet. 9:237. doi:10.3389/fgene.2018.00237
sci2008.08.0512 Li, M., B. Wang, M. Zhang, M. Rantalainen, S. Wang, H. Zhou,
Helentjaris, T., M. Slocum, S. Wright, A. Schaefer, and J. Nienhuis. et al. 2008. Symbiotic gut microbes modulate human meta-
1986. Construction of genetic linkage maps in maize and tomato bolic phenotypes. Proc. Natl. Acad. Sci. USA 105:2117–2122.
using restriction fragment length polymorphisms. Theor. Appl. doi:10.1073/pnas.0712038105
Genet. 72:761–769. doi:10.1007/BF00266542 Li, X., T. Guo, Q. Mu, X. Li, and J. Yu. 2018b. Genomic and envi-
Henderson, C.R., O. Kempthorne, S.R. Searle, and C.M. von ronmental determinants and their interplay underlying phe-
Krosigk. 1959. The estimation of environmental and genetic notypic plasticity. Proc. Natl. Acad. Sci. USA 115:6679–6684.
trends from records subject to culling. Biometrics 15:192–218. doi:10.1073/pnas.1718326115
doi:10.2307/2527669 Lippman, Z.B., and D. Zamir. 2007. Heterosis: Revisiting the magic.
Heslot, N., J.-L. Jannink, and M.E. Sorrells. 2015. Perspectives for Trends Genet. 23:60–66. doi:10.1016/j.tig.2006.12.006
genomic selection applications and research in plants. Crop Sci. Littleboy, M., D.M. Freebairn, D.M. Silburn, D.R. Woodruff, and
55:1–12. doi:10.2135/cropsci2014.03.0249 G.L. Hammer. 1999. PERFECT version 3. A computer simula-
Hill, C.B., J.D. Taylor, J. Edwards, D. Mather, P. Langridge, A. tion model of productivity erosion runoff functions to evaluate
Bacic, and U. Roessner. 2015. Detection of QTL for metabolic conservation techniques. Queensland Dep. Prim. Ind., Brisbane,
and agronomic traits in wheat with adjustments for variation at QLD, Australia.
genetic loci that affect plant phenology. Plant Sci. 233:143–154. Liu, X., H. Wang, H. Wang, Z. Guo, X. Xu, J. Liu, et al. 2018. Fac-
doi:10.1016/j.plantsci.2015.01.008 tors affecting genomic selection revealed by empirical evidence
Hoogenboom, G., H. Gerrit, J.W. White, A.-G. Jorge, R.G. Gaudiel, in maize. Crop J. 6:341–352. doi:10.1016/j.cj.2018.03.005
J.R. Myers, and M.J. Silbernagel. 1997. Evaluation of a crop sim- Lush, J.L. 1937. Animal breeding plans. Iowa State Press, Ames, IA.
ulation model that incorporates gene action. Agron. J. 89:613– Ma, W., Z. Qiu, J. Song, J. Li, Q. Cheng, J. Zhai, and C. Ma. 2018.
620. doi:10.2134/agronj1997.00021962008900040013x A deep convolutional neural network approach for predicting
Hufford, M. 2019. Assembly and comparative genomic analysis of phenotypes from genotypes. Planta 248:1307–1318. doi:10.1007/
the maize NAM founders. Paper presented at the 27th Plant & s00425-018-2976-9
Animal Genome Conference, San Diego, CA. 12–16 Jan. 2019. Malosetti, M., J.-M. Ribaut, and F.A. van Eeuwijk. 2013. The statis-
https://pag.confex.com/pag/xxvii/meetingapp.cgi/Paper/35331 tical analysis of multi-environment data: Modeling genotype-
(accessed 28 Mar. 2019). by-environment interaction and its genetic basis. Front. Physiol.
Hufford, M.B., P. Lubinksy, T. Pyhäjärvi, M.T. Devengenzo, N.C. 4:44. doi:10.3389/fphys.2013.00044
Ellstrand, and J. Ross-Ibarra. 2013. The genomic signature of Mancuso, N., H. Shi, P. Goddard, G. Kichaev, A. Gusev, and B. Pasa-
crop-wild introgression in maize. PLoS Genet. 9:e1003477. niuc. 2017. Integrating gene expression with summary associa-
doi:10.1371/journal.pgen.1003477 tion statistics to identify genes associated with 30 complex traits.
Jaenicke-Després, V., E.S. Buckler, B.D. Smith, M.T.P. Gilbert, A. Am. J. Hum. Genet. 100:473–487. doi:10.1016/j.ajhg.2017.01.031
Cooper, J. Doebley, and S. Pääbo. 2003. Early allelic selection Manolio, T.A., F.S. Collins, N.J. Cox, D.B. Goldstein, L.A. Hin-
in maize as revealed by ancient DNA. Science 302:1206–1208. dorff, D.J. Hunter, et al. 2009. Finding the missing heritability of
doi:10.1126/science.1089056 complex diseases. Nature 461:747–753. doi:10.1038/nature08494

Marcus, G. 2018. Deep learning: A critical appraisal. Cornell Univ., Reymond, M., B. Muller, A. Leonardi, A. Charcosset, and F. Tardieu.
Ithaca, NY. http://arxiv.org/abs/1801.00631 (accessed 1 June 2019). 2003. Combining quantitative trait loci analysis and an ecophysi-
Matsuoka, Y., Y. Vigouroux, M.M. Goodman, J. Sanchez G., E. ological model to analyze the genetic variability of the responses
Buckler, and J. Doebley. 2002. A single domestication for maize of maize leaf growth to temperature and water deficit. Plant
shown by multilocus microsatellite genotyping. Proc. Natl. Physiol. 131:664–675. doi:10.1104/pp.013839
Acad. Sci. USA 99:6080–6084. doi:10.1073/pnas.052125199 Riedelsheimer, C., A. Czedik-Eysenberg, C. Grieder, J. Lisec, F.
Messina, C., G. Hammer, Z. Dong, D. Podlich, and M. Cooper. Technow, R. Sulpice, et al. 2012. Genomic and metabolic pre-
2009. Modelling crop improvement in a GÉ´M framework diction of complex heterotic traits in hybrid maize. Nat. Genet.
via gene–trait–phenotype relationships. In: D. Calderini, edi- 44:217–220. doi:10.1038/ng.1033
tor, Crop physiology. Acad. Press, San Diego, CA. p. 235–581. Rodgers-Melnick, E., P.J. Bradbury, R.J. Elshire, J.C. Glaubitz, C.B.
doi:10.1016/B978-0-12-374431-9.00010-4 Acharya, S.E. Mitchell, et al. 2015. Recombination in diverse maize
Messina, C.D., J.W. Jones, K.J. Boote, and C.E. Vallejos. 2006. A is stable, predictable, and associated with genetic load. Proc. Natl.
gene-based model to simulate soybean development and yield Acad. Sci. USA 112:3823–3828. doi:10.1073/pnas.1413864112
responses to environment. Crop Sci. 46:456–466. doi:10.2135/ Rodgers-Melnick, E., D.L. Vera, H.W. Bass, and E.S. Buckler. 2016.
cropsci2005.04-0372 Open chromatin reveals the functional maize genome. Proc. Natl.
Messina, C.D., F. Technow, T. Tang, R. Totir, C. Gho, and M. Coo- Acad. Sci. USA 113:E3177–E3184. doi:10.1073/pnas.1525244113
per. 2018. Leveraging biological insight and environmental vari- Romero Navarro, J.A., M. Wilcox, J. Burgueño, C. Romay, K.
ation to improve phenotypic prediction: Integrating crop growth Swarts, S. Trachsel, et al. 2017. A study of allelic diversity under-
models (CGM) with whole genome prediction (WGP). Eur. J. lying flowering-time adaptation in maize landraces. Nat. Genet.
Agron. 100:151–162. doi:10.1016/j.eja.2018.01.007 49:476–480. doi:10.1038/ng.3784
Meuwissen, T.H., B.J. Hayes, and M.E. Goddard. 2001. Prediction Rosati, A., S.G. Metcalf, and B.D. Lampinen. 2004. A simple method
of total genetic value using genome-wide dense marker maps. to estimate photosynthetic radiation use efficiency of canopies.
Genetics 157:1819–1829. Ann. Bot. 93:567–574. doi:10.1093/aob/mch081
Mezmouk, S., and J. Ross-Ibarra. 2014. The pattern and distribution Ross, E.M., P.J. Moate, L.C. Marett, B.G. Cocks, and B.J. Hayes.
of deleterious mutations in maize. G3: Genes, Genomes, Genet. 2013. Metagenomic predictions: From microbiome to complex
4:163–171. doi:10.1534/g3.113.008870 health and environmental phenotypes in humans and cattle.
Millet, E.J., W. Kruijer, A. Coupel-Ledru, S. Alvarez Prado, L. PLoS One 8:e73056. doi:10.1371/journal.pone.0073056
Cabrera-Bosquet, S. Lacube, et al. 2019. Genomic prediction Rousselle, Y., E. Jones, A. Charcosset, P. Moreau, K. Robbins, B
of maize yield across European environmental conditions. Nat. Stich, et al. 2015. Study on essential derivation in maize: III.
Genet. 51:952–956. doi:10.1038/s41588-019-0414-y Selection and evaluation of a panel of single nucleotide polymor-
Money, D., K. Gardner, Z. Migicovsky, H. Schwaninger, G.-Y. phism loci for use in European and North American germplasm.
Zhong, et al. 2015. LinkImpute: Fast and accurate genotype Crop Sci. 55:1170–1180. doi:10.2135/cropsci2014.09.0627
imputation for nonmodel organisms. G3: Genes, Genomes, Schrag, T.A., M. Westhues, W. Schipprack, F. Seifert, A. Thiemann,
Genet. 5:2383–2390. doi:10.1534/g3.115.021667 S. Scholten, and A.E. Melchinger. 2018. Beyond genomic pre-
Montesinos-López, O.A., A. Montesinos-López, J. Crossa, D. Gianola, diction: Combining different types of omics data can improve
C.M. Hernández-Suárez, et al. 2018. Multi-trait, multi-envi- prediction of hybrid performance in maize. Genetics 208:1373–
ronment deep learning modeling for genomic-enabled predic- 1385. doi:10.1534/genetics.117.300374
tion of plant traits. G3: Genes, Genomes, Genet. 8:3829–3840. Seifert, F., A. Thiemann, T.A. Schrag, D. Rybka, A.E. Melchinger,
doi:10.1534/g3.118.200728 M. Frisch, and S. Scholten. 2018. Small RNA-based predic-
Parent, B., and F. Tardieu. 2014. Can current crop models be used in tion of hybrid performance in maize. BMC Genomics 19:371.
the phenotyping era for predicting the genetic variability of yield doi:10.1186/s12864-018-4708-8
of plants subjected to drought or high temperature? J. Exp. Bot. Shi, Y., J.A. Thomasson, S.C. Murray, N.A. Pugh, W.L. Rooney,
65:6179–6189. doi:10.1093/jxb/eru223 S. Shafian, et al. 2016. Unmanned aerial vehicles for high-
Peiffer, J.A., A. Spor, O. Koren, Z. Jin, S.G. Tringe, J.L. Dangl, et throughput phenotyping and agronomic research. PLoS One
al. 2013. Diversity and heritability of the maize rhizosphere 11:e0159781. doi:10.1371/journal.pone.0159781
microbiome under field conditions. Proc. Natl. Acad. Sci. USA Shikha, M., A. Kanika, A.R. Rao, M.G. Mallikarjuna, H.S. Gupta,
110:6548–6553. doi:10.1073/pnas.1302837110 and T. Nepolean. 2017. Genomic selection for drought toler-
Piperno, D.R., and K.V. Flannery. 2001. The earliest archaeologi- ance using genome-wide SNPs in maize. Front. Plant Sci. 8:550.
cal maize (Zea mays L.) from highland Mexico: New accelerator doi:10.3389/fpls.2017.00550
mass spectrometry dates and their implications. Proc. Natl. Acad. Soufizadeh, S., E. Munaro, G. McLean, A. Massignam, E.J. van
Sci. USA 98:2101–2103. doi:10.1073/pnas.98.4.2101 Oosterom, S.C. Chapman, et al. 2018. Modelling the nitrogen
Poland, J., J. Endelman, J. Dawson, J. Rutkoski, S. Wu, et al. 2012. Genomic dynamics of maize crops: Enhancing the APSIM maize model.
selection in wheat breeding using genotyping-by-sequencing. Plant Eur. J. Agron. 100:118–131. doi:10.1016/j.eja.2017.12.007
Genome 5:103–113. doi:10.3835/plantgenome2012.06.0006 Stitzer, M.C., and J. Ross-Ibarra. 2018. Maize domestication and gene
Rakocevic, G., V. Semenyuk, W.-P. Lee, J. Spencer, J. Browning, et al. interaction. New Phytol. 220:395–408. doi:10.1111/nph.15350
2019. Fast and accurate genomic analyses using genome graphs. Swarts, K., H. Li, J.A. Romero Navarro, D. An, M.C. Romay, S.
Nat. Genet. 51:354–362. doi:10.1038/s41588-018-0316-4 Hearne, et al. 2014. Novel methods to optimize genotypic
Ramstein, G.P., S.E. Jensen, and E.S. Buckler. 2018. Breaking the curse imputation for low-coverage, next-generation sequence data
of dimensionality to identify causal variants in breeding. Theor. in crop plants. Plant Genome 7(3). doi:10.3835/plantgen-
Appl. Genet. 132:559–567. doi:10.1007/s00122-018-3267-3 ome2014.05.0023
Rasheed, A., Y. Hao, X. Xia, A. Khan, Y. Xu, R.K. Varshney, et Tanksley, S.D., H. Medina-Filho, and C.M. Rick. 1982. Use of nat-
al. 2017. Crop breeding chips and genotyping platforms: Prog- urally-occurring enzyme variation to detect and map genes con-
ress, challenges, and perspectives. Mol. Plant 10:1047–1064. trolling quantitative traits in an interspecific backcross of tomato.
doi:10.1016/j.molp.2017.06.008 Heredity 49:11–25. doi:10.1038/hdy.1982.61

Technow, F., C.D. Messina, L.R. Totir, and M. Cooper. 2015. Westhues, M., C. Heuer, G. Thaller, R. Fernando, and A.E. Melch-
Integrating crop growth models with whole genome predic- inger. 2019. Efficient genetic value prediction using incomplete
tion through approximate Bayesian computation. PLoS One omics data. Theor. Appl. Genet. doi:10.1007/s00122-018-
10:e0130855. doi:10.1371/journal.pone.0130855 03273-1
Tettelin, H., V. Masignani, M.J. Cieslewicz, C. Donati, D. Medini, Westhues, M., T.A. Schrag, C. Heuer, G. Thaller, H.F. Utz, W. Schip-
N.L. Ward, et al. 2005. Genome analysis of multiple pathogenic prack, et al. 2017. Omics-based hybrid prediction in maize. Theor.
isolates of Streptococcus agalactiae: Implications for the microbial Appl. Genet. 130:1927–1939. doi:10.1007/s00122-017-2934-0
“pan-genome.” Proc. Natl. Acad. Sci. U. S. A. 102:13950–13955. White, J.W., and G. Hoogenboom. 1996. Simulating effects of genes
doi:10.1073/pnas.0506758102 for physiological traits in a process-oriented crop model. Agron.
Thomas, D. 2010. Gene–environment-wide association studies: Emerg- J. 88:416–422 . doi:10.2134/agronj1996.00021962008800030009x
ing approaches. Nat. Rev. Genet. 11:259–272. doi:10.1038/nrg2764 Whittaker, J.C., R. Thompson, and M.C. Denham. 2000. Marker-
Twohey, R.J., 3rd, L.M. Roberts, and A.J. Studer. 2018. Leaf stable assisted selection using ridge regression. Genet. Res. 75:249–252.
carbon isotope composition reflects transpiration efficiency in doi:10.1017/S0016672399004462
Zea mays. Plant J. 97:475–484. doi:10.1111/tpj.14135 Xiao, Y., H. Liu, L. Wu, M. Warburton, and J. Yan. 2017. Genome-
Unterseer, S., E. Bauer, G. Haberer, M. Seidel, C. Knaak, M. Ouzunova, wide association studies in maize: Praise and stargaze. Mol. Plant
et al. 2014. A powerful tool for genome analysis in maize: Develop- 10:359–374. doi:10.1016/j.molp.2016.12.008
ment and evaluation of the high density 600 k SNP genotyping Xu, C., Y. Ren, Y. Jian, Z. Guo, Y. Zhang, C. Xie, et al. 2017a.
array. BMC Genomics 15:823. doi:10.1186/1471-2164-15-823 Development of a maize 55 K SNP array with improved genome
van Eeuwijk, F., J. Denis, and M. Kang. 1996. Incorporating addi- coverage for molecular breeding. Mol. Breed. 37:20. doi:10.1007/
tional information on genotypes and environments in models for s11032-017-0622-z
two-way genotype by environment tables. In: M.S. Kang and Xu, S., Y. Xu, L. Gong, and Q. Zhang. 2016. Metabolomic prediction
H.G. Gauch, Jr., editors, Genotype by environment interaction. of yield in hybrid rice. Plant J. 88:219–227. doi:10.1111/tpj.13242
CRC Press, Boca Raton, FL. p. 15–49. Xu, Y., P. Li, Z. Yang, and C. Xu. 2017b. Genetic mapping of quan-
Vietmeyer, N. 2011. Our daily bread: The essential Norman Borlaug. titative trait loci in crops. Crop J. 5:175–184. doi:10.1016/j.
Bracing Books, Lorton, VA. cj.2016.06.003
Wallace, J.G., K.A. Kremling, L.L. Kovar, and E.S. Buckler. 2018. Yang, J., S. Mezmouk, A. Baumgarten, E.S. Buckler, K.E. Guill, M.D.
Quantitative genetics of the maize leaf microbiome. Phytobi- McMullen, et al. 2017. Incomplete dominance of deleterious alleles
omes J. 2:208–224. doi:10.1094/PBIOMES-02-18-0008-R contributes substantially to trait variation and heterosis in maize.
Wallach, D., D. Makowski, and J.W. Jones. 2018. Working with PLoS Genet. 13:e1007019. doi:10.1371/journal.pgen.1007019
dynamic crop models: Methods, tools and examples for agricul- Yin, X., P.C. Struik, F.A. van Eeuwijk, P. Stam, and J. Tang. 2005.
ture and environment. 3rd ed. Acad. Press, Amsterdam. QTL analysis and QTL-based prediction of flowering phenology
Walters, W.A., Z. Jin, N. Youngblut, J.G. Wallace, J. Sutter, W. in recombinant inbred lines of barley. J. Exp. Bot. 56:967–976.
Zhang, et al. 2018. Large-scale replicated field study of maize doi:10.1093/jxb/eri090
rhizosphere identifies heritable microbes. Proc. Natl. Acad. Sci. Yobi, A., and R. Angelovici. 2018. A high-throughput absolute-level
USA 115:7368–7373. doi:10.1073/pnas.1800918115 quantification of protein-bound amino acids in seeds. Curr. Pro-
Wang, E., M.J. Robertson, G.L. Hammer, P.S. Carberry, D. Holz- toc. Plant Biol. 3:e20084. doi:10.1002/cppb.20084
worth, H. Meinke, et al. 2002. Development of a generic crop Zenke-Philippi, C., M. Frisch, A. Thiemann, F. Seifert, T. Schrag,
model template in the cropping system model APSIM. Eur. J. A.E. Melchinger, et al. 2017. Transcriptome-based prediction of
Agron. 18:121–140. doi:10.1016/S1161-0301(02)00100-4 hybrid performance with unbalanced data from a maize breeding
Wang, J., M. van Ginkel, D. Podlich, G. Ye, R. Trethowan, W. Pfeiffer, programme. Plant Breed. 136:331–337. doi:10.1111/pbr.12482
et al. 2003. Comparison of two breeding strategies by computer Zenke-Philippi, C., A. Thiemann, F. Seifert, T. Schrag, A.E. Melch-
simulation. Crop Sci. 43:1764–1773. doi:10.2135/cropsci2003.1764 inger, S. Scholten, and M. Frisch. 2016. Prediction of hybrid per-
Wang, J., Z. Zhou, Z. Zhang, H. Li, D. Liu, Q .Zhang, et al. 2018a. formance in maize with a ridge regression model employed to
Expanding the BLUP alphabet for genomic prediction adaptable DNA markers and mRNA transcription profiles. BMC Genom-
to the genetic architectures of complex traits. Heredity 121:648– ics 17:262. doi:10.1186/s12864-016-2580-y
662. doi:10.1038/s41437-018-0075-0 Zhang, A., H. Wang, Y. Beyene, K. Semagn, Y. Liu, S. Cao, et al.
Wang, X., D. Singh, S. Marla, G. Morris, and J. Poland. 2018b. 2017. Effect of trait heritability, training population size and
Field-based high-throughput phenotyping of plant height in sor- marker density on genomic prediction accuracy estimation in 22
ghum using different sensing technologies. Plant Methods 14:53. bi-parental tropical maize populations. Front. Plant Sci. 8:1916.
doi:10.1186/s13007-018-0324-5 doi:10.3389/fpls.2017.01916
Washburn, J.D., and J.A. Birchler. 2014. Polyploids as a “model sys- Zhang, C., J. Zhang, H. Zhang, J. Zhao, Q. Wu, Z. Zhao, and T.
tem” for the study of heterosis. Plant Reprod. 27:1–5. doi:10.1007/ Cai. 2015. Mechanisms for the relationships between water-use
s00497-013-0237-4 efficiency and carbon isotope composition and specific leaf area
Washburn, J.D., M.K. Mejia-Guerra, G. Ramstein, K.A. Kremling, of maize (Zea mays L.) under water stress. Plant Growth Regul.
R. Valluru, E.S. Buckler, and H. Wang. 2019. Evolutionarily 77:233–243. doi:10.1007/s10725-015-0056-8
informed deep learning methods for predicting relative tran- Zhao, F., and S. Xu. 2012. Genotype by environment interaction of
script abundance from DNA sequence. Proc. Natl. Acad. Sci. quantitative traits: A case study in barley. G3: Genes, Genomes,
USA 116:5542–5549. doi:10.1073/pnas.1814551116 Genet. 2:779–788. doi:10.1534/g3.112.002980

Washburn 2019

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Washburn 2019

Transféré par

Droits d'auteur :

Formats disponibles

Published online October 3, 2019

Predictive Breeding for Maize: Making Use of

Published in Crop Sci. 59:1–15 (2019).

© 2019 The Author(s). Re-use requires permission from the publisher.

crop science www.crops.org 1

2 www.crops.org crop science

crop science www.crops.org 3

4 www.crops.org crop science

crop science www.crops.org 5

6 www.crops.org crop science

PREDICTING ACROSS GENOTYPE,

crop science www.crops.org 7

8 www.crops.org crop science

crop science www.crops.org 9

10 www.crops.org crop science

crop science www.crops.org 11

12 www.crops.org crop science

crop science www.crops.org 13

14 www.crops.org crop science

crop science www.crops.org 15

Vous aimerez peut-être aussi