Vous êtes sur la page 1sur 8

Europ. J.

Agronomy 68 (2015) 89–96

Contents lists available at ScienceDirect

European Journal of Agronomy


journal homepage: www.elsevier.com/locate/eja

Seed yield prediction of sesame using artificial neural network


Samad Emamgholizadeh a,∗ , M. Parsaeian b , Mehdi Baradaran b
a
Associate professor, Department of Water and Soil Engineering, Shahrood University, Shahrood, Iran
b
Assistant Professor, Department of Agronomy and Plant Breeding, Shahrood University, Shahrood, Iran

a r t i c l e i n f o a b s t r a c t

Article history: The prediction of seed yield is one of the most important breeding objectives in agricultural research. So,
Received 29 September 2014 in this study, two methods namely artificial neural network (ANN) and multiple regression model (MLR)
Received in revised form 23 April 2015 were employed to estimate the seed yield of sesame (SYS) from readily measurable plant characters
Accepted 30 April 2015
(e.g., flowering time of 100% (days), the plant height (cm), the capsule number per plant, the 1000-seed
weight (g) and the seed number per capsule). The ANN and MLR were tested using field data. Results
Keywords:
showed that the ANN predicts the SYS accurately with a root-mean-square-error (RMSE) of 0.339 t/ha
Seed yield estimation
and a determination coefficient (R2 ) of 0.901. Also, it was found that the ANN model performed better
Sesame
Artificial neural networks
than the MLR model with a RMSE of 0.346 t/ha, and R2 of 0.779. Finally, sensitivity analysis was conducted
Multiple regression model to determine the most and the least influential characters affecting SYS. It was found that the capsule
number per plant and the flowering time of 100% had the most and least significant effects on SYS,
respectively.
© 2015 Published by Elsevier B.V.

1. Introduction sion is predominantly affected by the environmental conditions;


hence it has a low heritability. Therefore, the response to the direct
Sesame (Sesamum indicum L.) is one of the oldest oil seed crops selection for this character could lead to a low profit. In contrast, the
and is widely grown in tropical and subtropical areas of Asia (Ashri, selection of high-heritable characters associated with seed yield
1998). There is no consensus among researchers regarding the cen- is good prospect for increasing its performance. Seed yield com-
ter of origin of the cultivated sesame and it seems that it is a crop ponents not only directly affect the seed yield but also indirectly
with more than one origin. The primary center of sesame origin affect the performance through negative or positive ways (Solanki
has been predicted to be Fertile Crescent, the Indian subcontinent and Gupta, 2001; Khan et al., 2007; Ibrahim and Khidir, 2012).
or the Iran–Afghanistan region (Ashri, 1989). Also, the wild species A number of studies have tried to relate the seed yield of sesame
of this plant can be found in Africa. Nowadays, our knowledge about to its components and morpho-phenological properties of plant
the nutritional value and health benefits of sesame, has increased using multivariate analysis such as multiple linear regressions
the universal demand for its seed and oil. The superior oil quality of (MLR), path analysis (PA), factor analysis (FA) and other techniques.
sesame, which is ascribed by the low level of saturated fatty acids For example Ganesh and Sakila (1999) and Ibrahim and Khidir
as well as the activity of unique natural antioxidants together with (2012) used simple phenotypic or genotype correlations to find the
tocopherols make it distinctive among other oil seed crops. How- association of seed yield and its contributing characters in sesame.
ever, sesame has a low ranking in the world production of edible They found that seed yield was positively correlated with the cap-
oil seeds which is mainly attributed to low yielding varieties with sule number per plant, plant height, 1000- seed weight and days
an indeterminate growth habit, uneven ripening of capsules, seed to maturity. Parimala and Mathur (2006) were employed multiple
shattering, susceptibility to environmental stresses and the lack of linear regression (MLR) to establish a model between seed yield of
adequate researches in this crop (Bhat et al., 1999). The increase of sesame and its component such as capsules per plant, the capsule
seed yield is one of the most important breeding objectives in all length, number of branches, the plant height, the number of seeds
crop plants. Seed yield is a quantitative polygenic and complex trait per capsule, the 1000-seed weight and days to first flower. Their
and also, it is a resultant of different factors. Its phenotypic expres- findings indicated that the number of capsules per plant was the
most important character and could explain the high amount of the
total variation of seed yield in sesame. Based on the predicted equa-
tion for grain yield Shim et al. (2006) found that the plant height
∗ Corresponding author. Tel.: +98 9111194389; fax: +98 234 5224620.
and number of capsules per plant had significant contribution to
E-mail address: s gholizadeh517@yahoo.com (S. Emamgholizadeh).

http://dx.doi.org/10.1016/j.eja.2015.04.010
1161-0301/© 2015 Published by Elsevier B.V.
90 S. Emamgholizadeh et al. / Europ. J. Agronomy 68 (2015) 89–96

sesame grain yield under the early sowing condition, while there 2822, Yekta, Darab1, TN234 and TN240) and four Asian genotypes,
was no significant effect for 1000-seed weight. Furthermore, path belonging to Pakistan, Indian, Turkey and Iraq (Pakistani, Indian,
coefficient analysis (PCA) has been widely used in sesame breed- EM and Iraq 22) were crossed in a full-diallel mating design. Iranian
ing programs to interpret the nature of relationships between seed genotypes were produced by selection within different Iranian lan-
yield and yield-determining traits (Yingzhong and Yishou, 2002; draces. The field experiment was carried out at the research farm
Shim et al., 2001; Mothilal, 2005; Kurdistani et al., 2011; Azeez of Isfahan University of Technology (32◦ 2 N and 51◦ 32 E, 1630 m as
and Morakinyo, 2011). Ganesh and Sakila (1999) also studied path 1) during the two years of 2007–2008 with a cropping season on a
analysis for some parental genotypes and their progeny in sesame Typic Haplargid soil with clay loam texture, pH 7.5 and organic mat-
and reported that the traits of number of capsules per plant, plant ter content of 1%. All parents, 81 F1 s (progenies of first filial), and 45
height and number of branches per plant had a high positive direct F2 s (progenies of second filial, without reciprocals) were agro- mor-
effect on the plant yield in sesame. Also the literature review shows phologically evaluated using a randomized complete block design
that many researchers such as Shim et al. (2006) and Ashraf (2013) with three replications. Each plot consisted of four rows 1.5 m in
have used MLR to predict the seed yield of sesame. length and 50 cm row to row spacing with inter-row plant distance
The main shortcoming of regression-based models is that of 7 cm. Fertilizer were applied at 80 kg N/ha and 100 kg P/ha prior
they cannot capture the highly nonlinear and complex relation- to sowing and 40 kg N/ha top dressed four weeks after planting.
ship between seed yield and the relevant crop plant properties. Days to flowering and days to maturity were visually recorded on
In contrast to the traditional methods such MLR and PA mod- plot basis. The plant height, the height to the first fruiting node,
els, the application of artificial intelligence (AI) models such as the number of fruiting branches per plant, the capsule number per
artificial neural networks (ANN), genetic expression program- plant, the seed number per capsule, the 1000-seedweight and seed
ming (GEP) and adaptive neuro-fuzzy inference system (ANFIS) yield per plant were recorded using 10 (F1 experiment) and 30
were recently attracted the attention of researchers in agricul- (F2 experiment) randomly selected plants from each plot and their
ture science (Azamathulla and Ghani, 2011; Shahinfar et al., 2012; average were used. Two middle rows of each plot were harvested
Emamgholizadeh et al., 2013a,b; Samadianfard et al., 2014; Silva to determine the seed yield.
et al., 2014; Iquebal et al., 2014). Alvarez (2007) used the ANN
approach to predict average regional yield and production of wheat 2.2. Statistical analysis
in the Argentine Pampas. Other researchers applied the ANN as a
feasible alternative method for the discrimination and identifica- Data were subjected to analysis of variance (ANOVA) using gen-
tion of Camellia japonicaCamellia japonica L. (Mugnai et al., 2008) eral linear model of SAS statistical program (2010). The association
and tea genotypes (Pandolfi et al., 2009). Yong-Jun et al. (2011) also of agro-morphological traits was evaluated by computing the Pear-
reiterated on the application of artificial neural network in genomic son correlation coefficients among all combinations of characters
selection for crop improvement. using SAS statistical package. The ANN was carried out by Qnet
Barbosa et al. (2011) used the ANN to study the genetic diversity 2000 model using seed yield as the dependent variable and the
analysis of Carica papaya based on the model proposed by Kohonen. remaining traits as independent variables. For training and testing
Zaefizadeh et al. (2011) also compared two methods of multiple the ANN model, 378 samples were collected from the study area.
linear regressions (MLR) and the artificial neural network (ANN) To have an overview of the measured variables (i.e., flowering time
in predicting the yield using its components in the hulless barley. (10% and 100%), seed maturity, plant height, capsule number per
Other researchers applied the ANN as a feasible alternative method plant, 1000-seed weight, seed number per capsule, plant height to
for the discrimination and identification of C. japonica L. (Mugnai the first fruiting branch, plant height to the first fruiting node, cap-
et al., 2008) and tea genotypes (Pandolfi et al., 2009). Yong-Jun et al. sule number per branch and branch number), the statistical indices
(2011) utilized ANN in genomic selection for crop improvement. are shown in Table 1.
The objective of this study is to develop and test the ANN model
to predict the seed yield of sesame (SYS). It is worth noting that this 2.3. Artificial neural network (ANN)
is the first study which explores the potential ability of the ANN to
model the seed yield of sesame. Artificial neural network (ANN) is an intelligence model and
it imitates the procedure of the human brain works (Tufail et al.,
2. Materials and methods 2008). In other words, the ANN has certain performance character-
istics like biological neural networks of the human brain (Haykin,
2.1. Field experiments 1994). This computational technique was first introduced in 1943
(McCulloch and Pitts, 1943). A typical ANN consists of a number of
Nine diverse genotypes of sesame including five genotypes simple processing elements called neurons or nodes. These neurons
representing different sesame growing regions of Iran (Varamin are organized into groups termed layers. Each neuron is connected

Table 1
Statistical indices of measured data.

Variables Synonym Min Max Mean Std. deviation

Flowering time of 10% (days) FT10 40.00 55.00 47.13 2.55


Flowering time of 100% (days) FT100 40.00 60.00 52.27 2.69
Seed maturity (days) SM 105.00 151.00 129.82 13.11
Plant height (cm) PH 100.70 200.40 144.95 16.96
Capsule number per plant CNPP 24.80 100.50 51.92 12.44
1000-seed weight (g) TSW 1.99 4.20 3.40 0.37
Seed number per capsule SNPC 27.58 100.48 55.56 14.17
Plant height to first fruiting branch (cm) PHFFB 26.10 88.90 60.68 10.69
Plant height to first fruiting node (cm) PHFFN 0.00 5.20 1.81 0.92
Capsule number per branch CNPB 0.00 62.80 30.78 14.87
Branch number BN 0.00 18.20 8.33 4.22
Seed yield of sesame (t/ha) SYS 1.24 5.04 2.76 0.71
S. Emamgholizadeh et al. / Europ. J. Agronomy 68 (2015) 89–96 91

Table 2
FT100 Neuron activate function.

Activation function Formula


PH
Sigmoid f (x) = 1+e1−x
SYS Hyperbolic tangent f (x) = tanh(x)
SNPC Hyperbolic Secant f (x) = sech(x)
2
Gaussian f (x) = e−x
TSW

CNPP by Rumelhart et al. (1986) is the most popular algorithm for train-
ing of the MLP network (Wasserman, 1989; Fausett, 1994; Haykin,
1994). The back- propagation algorithm involves two steps. The
Input Layer Hidden Layer Output first step is a forward pass, in which the effect of the input is passed
forward through the network to reach the output layer. After the
Fig. 1. Structure of a typical MLP model. RMS error between the network response and the training targets
is computed, a second step starts backward through the network.
to other neurons by means of direct communication links, each with The errors at the output layer are propagated back toward the input
an associated weight. The weights represent information being layer with the weights being modified according to the Eq. (2).
used by the net to solve a problem (Mozejko and Gniot, 2008). ∂E
The organization and weights of these connecting elements are wij (n) = − × + ˛ × wij (n − 1) (2)
∂wij
adjusted through a process of “training” in order to calibrate the
model (Tufail et al., 2008). The ANN can be used to identify and learn where wij (n) and wij (n − 1) are, respectively, weight incre-
correlated patterns between input and output data sets especially ments between node i and j during the nth and (n − 1)th pass, or
when the underlying data relationship is nonlinear, the volume epoch,  and ˛ are called the learning rate coefficient and momen-
and number of variables or diversity of the data are very great, tum factor, and control the algorithm’s rate of learning. To optimize
the relationships between variables are vaguely understood, or the the rate at which a network learns, these factors must be set and/or
relationships are difficult to describe adequately with conventional adjusted properly during the training process. The valid range for
approaches. Thus, the ANN provides an analytical alternative to both  and ˛ is between 0 and 1 (Qnet2000 Manual, 1999). In this
conventional techniques such as linear regression, multivariable study, four different activation functions including, Gaussian, Sig-
regression, and other statistical techniques, which are often limited moid, Hyperbolic Tangent and Hyperbolic Secant have been used
by strict assumptions of normality, linearity, variable indepen- and are listed in Table 2.
dence, etc (Singh et al., 2003). An ANN can be classified according to
its structure, its type of nodes and etc. Also different algorithm such 3. Results and discussion
as back-propagation, feed forward back-propagation, feed forward
cascade correlation, radial basis function and conjugate gradients In this study, field data measured in the research farm of Isfahan
can be used according to the need of training convergence (ASCE, University of Technology were used for the training and testing of
2000a,b). These models have been used by many researchers for the ANN and MLR models. One of the most important steps in devel-
engineering problems (Yilmaz and Kaynar, 2011; Lashkarblooki oping of these models for satisfactory forecasting is the selection
et al., 2012; Bastania et al., 2013; Kolay and Baser, 2014). In the of the input variables. Because, these variables determine model
present study, the multi-layer perceptron (MLP) with back propa- structures and affect the weighted coefficient and the results of
gation algorithm was used. the models. For this purpose, a correlation analysis is performed
to assess relationships between the seed yield of sesame (SYS)
2.4. Multi-layer perceptron (MLP) and the crop plant variables (i.e., flowering time (10% and 100%),
seed maturity, plant height, capsule number per plant, 1000-seed
The MLP has more layers namely: (a) an input layer, (b) an weight, seed number per capsule, plant height to the first fruit-
intermediate or hidden layer(s) and (c) an output layer. A typi- ing branch, plant height to the first fruiting node, capsule number
cal configuration for MLP is shown in Fig. 1. As indicated, a set per branch and branch number) (see Table 3). This analysis allows
of data (FT10, PH, SNPC, TSW, CNPP) is first fed directly into the us to understand how each variable affects SYS and eventually
MLP through the input layer, and subsequently, the multi-layer which variable(s) should be used as input in ANN and MLR models.
perceptron produces an expected result of SYS in the output layer. The strongest correlation is observed between SYS and seed num-
The number of hidden layers exhibits the complexity of the MLP, ber per capsule (CNPP) with correlation coefficient (R) of 0.639.
because a greater number of hidden layers increase the number Also, SYS is found to be well correlated with plant height (PH)
of connections in the ANN. The number of hidden layers and the with R = 0.468. These findings are consistent with the studies of
number of neurons in each hidden layer are often varied to opti- Shim et al. (2006), Ganesh and Sakila (1999), Yol et al. (2010),
mize the performance of the final model (Tufail et al., 2008). The Goudappagoudra et al. (2011), Kurdistani et al. (2011) and Ibrahim
number of nodes in each layer is evaluated by trial and error. The and Khidir (2012). Also there is a significant correlation between
MLP is trained with a training set of input and known output data. SYS and the flowering time of 100% (FT100), seed number per cap-
MLP transforms m inputs to n outputs through some nonlin- sule (SNPC) and 1000-seed weight (TSW) with R = 0.273, 0.338,
ear functions. With activation of the units in the output layer, the 0.186 and 0.375, respectively. These results are in conformity with
output of the MLP network will be determined as follows: the findings of Yol et al. (2010), Akbar et al. (2011), and Ibrahim and
  Khidir (2012). In contrast, a very weak correlation is found between
Xo = f Xh who + bj (1) SYS and the branch number (BN), plant height to the first fruit-
ing branch (PHFFB) and capsule number per branch (CNPB) with
where (f) is the activation function, Xh is the activation of hidden R = 0.063, 0.127 and 0.113, respectively. In other words, the corre-
layer node; who is the interconnection between hidden and out- lation between the SYS and BN, PHFFB and CNPB is not significant at
put layers nodes and the bj is the bias. Back propagation proposed the 0.01 and 0.05 levels; therefore, we do not consider them as input
92 S. Emamgholizadeh et al. / Europ. J. Agronomy 68 (2015) 89–96

Table 3
The correlation analysis between SYS and the crop plant variables.

FT10 FT100 SM PH CNPP SNPC TSW PHFFN BN PHFFB CNPB SYS

FT10 1
FT100 0.888a 1
SM 0.463a 0.454a 1
PH 0.397a 0.415a 0.333a 1
CNPP 0.074 0.102 −0.044 0.324a 1
SNPC 0.023 −0.006 −0.126 0.059 −0.12 1
TSW 0.282a 0.284a 0.680a 0.267a −0.063 −0.171a 1
PHFFN 0.556a 0.505a 0.471a 0.641a 0.220a 0.055 0.379a 1
BN −0.187a −0.114 0.108 −0.258a 0.104 −0.260a 0.256a −0.013 1
PHFFB −0.043 −0.039 0.087 0.165a −0.037 0.004 0.192a 0.339a 0.434a 1
CNPB −0.322a −0.262a −0.119 −0.179a 0.200a −0.144b 0.035 −0.072 0.692a 0.58a 1
SYS 0.271a 0.273a 0.283a 0.468a 0.639a 0.338a 0.375a 0.437a 0.063 0.127 0.113 1

FT10 = flowering time of 10%, FT100 = flowering time of 100%, PH = plant height, CNPP = capsule number per plant, TSW = 1000-seed weight plant, SNPC = seed number per
capsules, SYS = seed yield of sesame, PHFFB = plant height to the first fruiting branch, PHFFN = plant height to the first fruiting node, CNPB = capsule number per branch,
BN = branch number and SYS = seed yield of sesame.
a
Correlation is significant at the 0.01 level (2-tailed).
b
Correlation is significant at the 0.05 level (2-tailed).

data to predict the SYS. In addition, the results revealed that there number of neurons (nodes) in the input and output layers is gen-
are strong correlation between the flowering time of 10% (FT10) and erally simple because it is dictated by the number of model inputs
the flowering time of 100% (FT100) (R = 0.888), 1000-seed weight and outputs and usually the choice of input data is based on the
(TSW), seed maturity (SM) (R = 0.680) and also plant height (PH) nature of the problem. Determining the optimum structure of the
and plant height to the first fruiting node (PHFFN) (R = 0.641). So, ANN model, especially selecting the number of layers and neurons
three variables of FT100, TSW and PH are used as input data instead (nodes) in the hidden layer(s) is one of the most important and
of FT10, SM and PHFFN. difficult tasks (Baziar and Ghorbani, 2011) and they are typically
Following the correlation analysis (see Table 3) and previous determined by trial-and- error (Eberhart and Dobbins, 1990). In
studies (e.g., Shim et al., 2006; Yol et al., 2010; Akbar et al., 2011; this study, the optimal structure was specified by changing the
Kurdistani et al., 2011; Ibrahim and Khidir, 2012), five variables number of hidden nodes from 1 to 5 and picking the ANN struc-
namely the flowering time of 100% (FT100), plant height (PH), cap- ture which leads to the best results. The ANN training was stopped
sule number per plant (CNPP), 1000-seed weight (TSW) and seed when maximum iterations of 100,000 were reached. A learning rate
number per capsule (SNPC) were applied to estimate the SYS. Both coefficient of  = 0.01 and momentum factor of ˛ = 0.1 were used
models were trained using the same input data. The whole dataset in this study. Also, the ANN model was run with different trans-
included 378 field data points, which was randomly divided into fer functions including Sigmoid, Hyperbolic Tangent, Gaussian and
two parts: training (302 data points) and testing (76 data points). Hyperbolic Secant.
Dividing the whole data set into training and testing data points Comparison of results with the aforementioned transfer func-
was done based on the simple random sampling approach. In this tions is shown in Table 4. As can be seen in this table, the ANN
method, each data point is chosen randomly from the data sets such model with transfer functions of Hyperbolic Tangent and Sigmoid
that each individual has the same probability of being chosen at any gives the best results in training and testing stages (i.e., the least
stage during the sampling process (Yates et al., 2008). MAE and RMSE and highest R2 values). In contrast, applying transfer
In order to compare the performance of the ANN and MLR functions of Gaussian and Hyperbolic Secant caused the accuracy of
models, three statistical criteria were used in this research. These ANN model significantly decreased in both training and testing. The
statistical criteria were the root mean square error (RMSE), mean reason for that was related to the natures of transfer functions. For
absolute error (MAE) and the coefficient of determination, R2 of the example, the functions of Sigmoid and Hyperbolic Tangent act as
linear regression line between the predicted values from either the an output gate that can be opened (1), closed (0) or somewhere
MLR and ANN models and the desired output. These metrics can be in-between for a node’s output response, but the Gaussian and
shown by: Hyperbolic Secant transfer functions significantly alter the learning
1 n
dynamics of a neural network model and they act like a probabilistic
MAE = |Oi − Pi | (3) output controller (Qnet2000 Manual, 1999).
n i=1
 Comparison of results with the aforementioned different hid-
n den nodes indicated that the ANN model with one hidden layer
2
(Oi − Pi )
i=1 gives the best results in the training stage with MAE = 0.163 t/ha,
RMSE = (4)
n RMSE = 0.212 t/ha and R2 = 0.911 and with MAE = 0.166 t/ha,
n    RMSE = 0.214 and R2 = 0.901 in testing stages. Table 5 shows these
Oi − Ō Pi − P̄
R2 = i=1
(5) results.
n  2 n  2
Oi − Ō Pi − P̄
i=1 i=1 Table 4
The performance of the ANN model with different transfer functions.
where n is the number of data, Oi is the observed values, Pi is the
predicted values and the bar denote the mean of the variable. Transfer function Training Testing

R2 RMSE MAE R2 RMSE MAE


3.1. SYS estimates from MLP/ANN Model (t/ha) (t/ha) (t/ha) (t/ha)

Sig. 0.911 0.212 0.163 0.901 0.214 0.166


In addition to the selecting of input data which affect the perfor- Gauss. 0.010 0.711 0.582 0.001 0.703 0.598
mance of MLP/ANN model, selecting the number of hidden nodes Tan. H. 0.898 0.227 0.174 0.892 0.223 0.170
Sec. H. 0.114 0.456 0.354 0.010 0.476 0.374
and also the transfer function between nodes are important. The
S. Emamgholizadeh et al. / Europ. J. Agronomy 68 (2015) 89–96 93

Fig. 2. Measured and predicted seed yield of sesame (SYS) from ANN model for training period.

Fig. 3. Measured and predicted seed yield of sesame (SYS) from ANN model for testing period.

Overall, based on Tables 4 and 5, it is clear that the ANN model the ANN model and they have the same median and also the same
with one hidden layer and with transfer function of Sigmoid has distribution.
the minimal MAE and RMSE and maximal correlation (R2 ), and has
presented more correct predicting rather other ANN structures. 3.2. Multiple linear regression analysis (MLR)
The estimated seed yield of sesame (SYS) values were compared
with observations in Figs. 4 and 5 for both training and testing The MLR model was carried out to correlate the measured seed
datasets in two forms of scatter plot (left panel). Furthermore, yield of sesame (SYS) to five variables, namely, flowering time of
the box plot (right panel) was used for comparison of measured 100% (FT100), plant height (PH), the capsule number per plant
seed yield of sesame (SYS) data and the predicted data with the (CNPP), 1000-seed weight (TSW) and the seed number per cap-
ANN model (Figs. 2 and 3b). The box plot is a convenient way of sule (SNPC). For this purpose, the MLR model was generated using
graphically depicting groups of numerical data through their five- the SPSS software package. The regression model was also applied
number summaries (minimum of sample, lower quartile, median, to the same inputs and output data which used for the MLP-ANN
upper quartile, and maximum of sample). As shown in these fig- model. The following equation is achieved for the calculation of SYS
ures, there is no significant difference between measured data and based on the MLR model:

Table 5 ␣ = −5.613 + 0.013FD1000 + 0.004PH + 0.716CNPP + 0.039TSW


The performance of the ANN model with different hidden layers
+0.048SNPC (6)
Number of Training Testing
hidden layer(s) where FT100 is days of 100% flowering, the PH is the plant height,
R2 RMSE MAE R2 RMSE MAE the CNPP is the capsule number per plant, the TSW is the 1000-seed
(t/ha) (t/ha) (t/ha) (t/ha) weight plant, the SNPC is the seed number per capsules, and the SYS
1 0.911 0.212 0.163 0.901 0.214 0.166
is the seed yield of sesame. The MLR based equation (Eq. (6)) allows
2 0.903 0.222 0.169 0.897 0.215 0.175 us to assess the rate of variations of SYS with changes in each of its
3 0.895 0.231 0.176 0.855 0.239 0.186 influential soil variables (i.e., FT100, PH, CNPP, TSW and SNPC) and
4 0.859 0.238 0.188 0.848 0.259 0.196 thus provides insight into the physics of the problem. This can be
5 0.819 0.258 0.198 0.836 0.287 0.201
done easily by taking the derivative of SYS in Eq. (6) with respect to
94 S. Emamgholizadeh et al. / Europ. J. Agronomy 68 (2015) 89–96

Fig. 4. Measured and predicted seed yield of sesame (SYS) from MLR model for training period.

Fig. 5. Measured and predicted seed yield of sesame (SYS) from MLR model for testing period.

each of its independent variables. It helps agronomists understand comparison of measured and predicted SYS data with the MLR
how SYS varies with FT100, PH, CNPP, TSW and SNPC and to what method (Figs. 4 and 5). It can be seen that there is variation in the
extent these variables should be altered to reach the desirable SYS predicted data with this method and also there is outliers (unusual
value. values of the data) in two stages of training and testing. This sug-
The observed and estimated SYS values from the MLR model for gests that there is a difference between the measured and predicted
the training data sets have been compared in Fig. 4 in the form of data. In comparison to the ANN model the performance of the MLR
a scatter plot (left panel) and a box plot (right panel). For inves- model is not good.
tigation the ability of Eq. (6) to predict SYS the testing data sets
have been used. As illustrated in Figs. 4 and 5, the proposed model
has R2 = 0.811, RMSE = 0.309 t/ha and MAE = 0.229 t/ha for the train- 3.3. Comparing MLR with MLP/ANN model
ing stage and R2 = 0.779, RMSE = 0.346 t/ha and MAE = 0.234 t/ha for
the testing stage. It can be concluded from the statistical quan- To further assess the capability of MLR and MLP/ANN models in
tities that the MLR model has relatively the lowest R2 with the estimating SYS, their results are compared with each other. Using
highest errors (RMSE and MAE). Also, the box plot was used for the MLR approach, Eq. (6) developed for SYS.

Table 6
Sensitivity analysis of the governing variables on the SYS

Method ANN MLR


2
MAE RMSE R MAE RMSE R2

The best ANN/MLR (with FT100, PH, CNPP, TSW, SNPC as input) 0.166 0.214 0.901 0.234 0.346 0.779
MLR/ANN without CNPP 0.456 0.577 0.335 0.445 0.563 0.354
MLR/ANN without PH 0.408 0.537 0.455 0.411 0.544 0.444
MLR/ANN without TSW 0.379 0.497 0.489 0.380 0.499 0.477
MLR/ANN without SNPC 0.211 0.378 0.568 0.215 0.381 0.554
MLR/ANN without FT100 0.172 0.302 0.735 0.175 0.315 0.717
S. Emamgholizadeh et al. / Europ. J. Agronomy 68 (2015) 89–96 95

The training dataset (with 302 data points) was used to train and the corresponding MAE values were 0.166 t/ha and 0.234 t/ha.
MLR and MLP/ANN in this study. The testing dataset was utilized in Overall, the results indicated that the ANN was an effective and reli-
the ANN model to estimate SYS and eventually compare its perfor- able method for estimating the SYS. The results also showed that
mance with MLR. The scatter plots of the computed (by MLP/ANN the ANN could predict the SYS more accurately than the MLR and it
and MLR) and observed SYS values during the testing period for reduced the RMSE and MAE of SYS estimation by 61.58% and 40.99%
both models (Figs. 3 and 4) showed a clear indication of the rela- compared to the MLR model.
tive skills of two mentioned models. But the MLR approach gave Several sensitivity tests were performed to determine the effect
the least accurate SYS estimates with large scattering in compar- of plant properties (i.e., the flowering time of 100% (FT100), plant
ison to the ANN model. It suggests that the MLR performance for height (PH), capsule number per plant (CNPP), 1000-seed weight
the estimation of SYS values is poor and it is not able to predict SYS (TSW) and seed number per capsule (SNPC)) on its SYS. In these
values compared with the ANN model. Using the ANN, the MAE tests, the ANN and MLR were run without a specific input variable.
and RMSE were reduced by 40.99% and 61.58% compared to MLR. It was found that the RMSE of SYS estimates increased by 169.63%
In overall, the results of this study suggest that the ANN model is a and 62.72% when the ANN and MLR model were run without the
viable alternative to the already popular MLR model and the com- capsule number per plant. Overall, the results showed that the cap-
parison clearly suggests that the choice of model has a significant sule number per plant (CNPP) and flowering time of 100% (FT100)
effect on the model predictions. had the most and the least effect on the SYS.

3.4. Sensitivity analysis


References
A number of sensitivity tests were performed to determine the
Akbar, F., Rabbani, M.A., Shinwari, Z.B., Khan, S., 2011. Genetic divergence in
relative importance of each input variable on the SYS. Table 6 shows sesame (Sesamum indicum L.) Landraces based on qualitative and quantitative
the statistical indices of the MLR and ANN models without a spe- traits. Pak. J. Bot. 43 (6), 2737–2744.
cific input variable (that use the flowering time of 100% (FT100), Alvarez, R., 2007. Predicting average regional yield and production of wheat in the
Argentine Pampas by an artificial neural network approach. Eur. J. Agron. 30,
plant height (PH), capsule number per plant (CNPP), 1000-seed
70–77.
weight(TSW) and seed number per capsule(SNPC) as input data). ASCE Task Committee, 2000a. Artificial neural networks in hydrology. I:
As illustrated, the MLR and ANN models without the capsule num- preliminary concepts. J. Hydrol. Eng. 5 (2), 115–123,
http://dx.doi.org/10.1061/(ASCE)1084-0699(2000)5:2(115).
ber per plant have the highest MAE and RMSE. The RMSE and R2
ASCE Task Committee, 2000b. Artificial neural networks in hydrology. II:
of the ANN (MLR) model without the capsule number per plant are hydrologic applications. J. Hydrol. Eng. 5 (2), 124–137,
0.577 t/ha (0.563 t/ha) and 0.335 (0.354), respectively. This means http://dx.doi.org/10.1061/(ASCE)1084-0699(2000)5:2(124).
that the MAE and RMSE of the ANN (MLR) model without the cap- Ashri, A., 1998. Sesame Breeding. In: Janick, J. (Ed.), Plant Breeding Review, 160.
Wiley, Somerset, pp. 179–228.
sule number per plant are 174.70% (90.17%) and 169.63% (62.72%) Ashraf, A., 2013. Comparison of some statistical techniques in evaluating Sesame
larger than those of the best ANN (MLR) model. Thus, the ability of yield and its contributing factors. Scientia Agriculturae 1 (1), 8–14.
ANN and MLR models to retrieve the SYS is significantly degraded Ashri, A., 1989. Sesame. In: Robbelen, G., Downey, R.K., Ashri, A. (Eds.), Oil Crops of
the World. McGraw-Hill, New York, pp. 375–387.
when they are run without the capsule number per plant (CNPP). Azamathulla, H.M., Ghani, A.A., 2011. Genetic programming for predicting
This implies that the capsule number per plant has the most sig- longitudinal dispersion coefficients in streams. Water Res. Manage. vol. 25 (6
nificant effect on the SYS which is in agreement with findings of April), 1537–1544.
Azeez, M.A., Morakinyo, J.A., 2011. Path analysis of the relationships between
Shim et al. (2006), Ganesh and Sakila (1999), Goudappagoudra et al. single seed yield and some morphological traits in sesame (genera Sesamum
(2011), Ibrahim and Khidir (2012). Overall, the effect of input vari- and Ceratotheca). Int. J. Plant Breed. Genet. 5, 358–368.
ables on the SYS can be ranked from higher to lower as the capsule Barbosa, C.D., Viana, A.P., Red Quintal, S.S., Pereira, M.G., 2011. Artificial neural
network analysis of genetic diversity in Carica papaya L. Crop Breed. Appl.
number per plant, plant height, 1000-seed weight, seed number per
Biotechnol. 11, 224–231.
capsule and flowering time of 100% (see Table 6). Also these out- Bastania, D., Hamzehiea, M.E., Davardoosta, F., Mazinanib, S., Poorbashiria, A.,
comes are consistent with the correlation analysis in Table 2. As the 2013. Prediction of CO2 loading capacity of chemical absorbents using a
multi-layer perceptron neural network. Fluid Phase Equilib. 354, 6–11.
seeds of sesame are produced within the structure called the cap-
Baziar, M.H., Ghorbani, A., 2011. Evaluation of lateral spreading using artificial
sule, hence the maximum number of capsules per plant and also the neural networks. Expert Syst. Appl. 38, 5958–5966.
seed number per capsule and 1000-seed weight increased the seed Bhat, V., Babrekar, P.P., Lakhanpaul, S., 1999. Study of genetic diversity in Indian
yield of the SYS. Besides this, sesame is an indeterminate plant and and exotic sesame (Sesamum indicum L.) germplasm using random amplified
polymorphic DNA (RAPD) markers. Euphytica 110, 21–33.
the increase of plant height can increase the number of capsules Eberhart, R.C., Dobbins, R.W., 1990. Neural Network PS Tools: A Practical Guide.
formed on the stem. Moreover, if the plant flowering phase start Academic press, San Diego.
earlier or we have longer flowering period, the plant can produce a Emamgholizadeh, S., Bateni, S.M., Jeng, D.S., 2013a. Artificial intelligence-based
estimation of flushing half-cone geometry. Eng. Appl. Artif. Intell. 26 (10),
greater number of capsules and therefore a greater seed yield can 2551–2558, http://dx.doi.org/10.1016/j.engappai.2013.05.014.
be achieved. Emamgholizadeh, S., Kashi, H., Marofpoor, I., Zalaghi, E., 2013b. Prediction of water
quality parameters of Karoon River (Iran) by artificial intelligence-based
models. Int. J. Environ. Sci. Technol. 11 (3), 645–656.
4. Conclusion Fausett, L., 1994. Fundamentals of Neural Networks: Architectures, Algorithms,
and Applications. Prentice- Hall, Englewood Cliffs, NJ.
Ganesh, S.K., Sakila, M., 1999. Association analysis of single plant yield and its yield
In this study, artificial neural networks (ANN) and multiple lin- contributing characters in sesame (Sesamum indicum L.). Sesame Safflower
ear regression (MLR) approaches have been used to predict the seed Newslett. 14, 16–18.
Goudappagoudra, R., Lokesha, R., Ranganatha, A.R.G., 2011. Trait association and
yield of sesame (SYS). Based on the correlation analysis and exist-
path coefficient analysis attributing traits in sesame (Sesamum indicum L.).
ing studies, five plant variables namely, flowering time of 100% Electron. J. Plant Breed. 2 (3), 448–452.
(FT100), plant height (PH), capsule number per plant (CNPP), 1000- Haykin, S., 1994. Neural networks. In: A comprehensive foundation. IEEE press,
MacMillan, New York.
seed weight (TSW) and seed number per capsule (SNPC) were used
Ibrahim, S.E., Khidir, M.O., 2012. Genotypic correlation and path coefficient
in the ANN and MLR to estimate the SYS. Both models were tested analysis of yield and some yield components in sesame (Sesamum indicum L.).
using collected data from the research farm of Isfahan University Int. J. AgriSci. 2 (8), 664–670.
of Technology in Iran. The results showed that the SYS estimates Iquebal, M.A., Ansari, M.S., Sarika, S.P., Dixit, N.K., Aggarwal, Verma R.A.K.,
Jayakumar, S., Rai, A., Kumar, D., 2014. Locus minimization in breed prediction
from the ANN were better than those from the MLR. The estimated using artificial neural network approach. Stichting Int. Found. Anim. Gen. 45,
SYS by the ANN and MLR had a RMSE of 0.214 t/ha and 0.346 t/ha 898–902.
96 S. Emamgholizadeh et al. / Europ. J. Agronomy 68 (2015) 89–96

McCulloch, W.S., Pitts, W.H., 1943. A logical calculus of the ideas immanent in Silva, G.N., Tomaz, R.S., Sant’Anna, I.C., Nascimento, M., Bhering l, L., Cruz, C.D.,
nervous activity. Bull. Math. Biophys. 5, 115–133. 2014. Neural networks for predicting breeding values and genetic gains. Sci.
Mothilal, A., 2005. Correlation and path analysis in sesame (Sesamum indicum L.). Agricola 71 (6), 494–498.
Environ. Ecol. 233, 478–480. Shahinfar, S., Mehrabani-Yeganeh, H., Lucas, C., Kalhor, A., Kazemian, M., Weigel,
Mozejko, J., Gniot, R., 2008. Application of Neural Networks for the prediction of K.A., 2012. Prediction of breeding values for dairy cattle using artificial neural
total phosphorus concentrations in surface waters. Pol. J. Environ. Stud. 17 (3), networks and neuro-fuzzy systems. Hindawi Publishing Corporation Comput.
363–368. Math. Methods Med., 1–9, http://dx.doi.org/10.1155/2012/127130.
Mugnai, S., Pandolfi, C., Azzarello, E., Masi, E., Mancuso, S., 2008. Camellia japonica Shim, K.B., Kang, C.W., Seong, J.D., Hwang, C.D., Suh, D.Y., 2006. Interpretation of
L. genotypes identified by an artificial neural network based on phyllometric relationship between sesame yield and it’s components under early sowing
and fractal parameters. Plant SystEvol. 270, 95–108. cropping condition. Kor. J. Crop Sci. 51 (4), 269–273.
Khan, M.A., YasinMirza, M., Akmal, M., Naazar, A., Khan, I., 2007. Genetic Shim, K.B., Kang, C.W., Lee, S.W., Kim, D.H., Lee, B.H., 2001. Heritabilities, genetic
parameters and their implications for yield improvement in sesame. Sarhad J. correlations and path coefficients of some agronomic traits in different
Agric. 23 (3), 623–627. environments in sesame. Sesame Safflower Newslett. 16, 23–27.
Kolay, E., Baser, T., 2014. Estimating of the dry unit weight of compacted soils using Solanki, Z.S., Gupta, D., 2001. Combining ability and heterosis studies for seed yield
general linear model and multi-layer perceptron neural networks. Appl. Soft and its components in sesame. Sesame Safflower Newslett. 16, 9–12.
Comput. 18, 223–231. Tufail, M., Ormsbee, L.E., Teegavarapu, R., 2008. Artificial intelligence-based
Kurdistani, R., Tohidinejad, E., Mohammadi-Nejad, G., Zareie, S., 2011. Yield inductive models for prediction and classification of fecal coliform in surface
potential evaluation and path analysis of different sesame genotypes under waters. J. Environ. Eng. 134 (9), 789–799.
various levels of iron. Afr. J. Plant Sci. 5, 862–866. Wasserman, P.D., 1989. Neural Computing: Theory and Practice. Van Nostrand
Lashkarblooki, M., Zeinolabedini Hezaveb, A., Al-Ajmic, A.M., Ayatollahib, S., 2012. Reinhold, New York.
Viscosity prediction of ternary mixtures containing ILs using multi-layer Yates, D.S., David, S.M., Daren, S.S., 2008. The Practice of Statistics, 3rd ed. Freeman,
perceptron artificial neural network. Fluid Phase Equilib. 326, ISBN 978-0-7167-7309-2.
15–20. Yingzhong, Z., Yishou, W., 2002. Genotypic correlations and path coefficient
Pandolfi, C., Mugnai, S., Azzarello, E., Bergamasco, S., Masi, E., Mancuso, S., 2009. analysis in sesame. Sesame and Safflower News 17, 10–12.
Artificial neural networks as a tool for plant identification: a case study on Yilmaz, I., Kaynar, O., 2011. Multiple regression, ANN (RBF, MLP) and ANFIS models
Vietnamese tea accessions. Euphytica 166, 411–421. for prediction of swell potential of clayey soils. Expert Syst. Appl. 38,
Parimala, K., Mathur, R.K., 2006. Yield component analysis through multiple 5958–5966.
regression analysis in sesame. Int. J. Agric. Sci. 2 (2), 338–340. Yol, E., Karaman, E., Furat, S., Uzun, B., 2010. Assessment of selection criteria in
Qnet2000 Manual, 1999. Qnet2000 neural network modelling for windows sesame by using correlation coefficients, path and factor analyses. Aust. J. Crop
95/98/NT. In: Qnet Toll User’s Guide and Datapro User’s Guide. Vesta Services Sci. 4, 598–602.
Inc., USA. Yong-Jun, S., Lei, W., Wang, D., Chang-Hong, G., 2011. Application of artificial
Rumelhart, D.E., Hinton, G.E., Williams, R.J., 1986. Learning internal representation neural network in genomic selection for crop improvement. Acta Agronom.
by error back propagation. In: Rumelhart, D.E., McClelland, J.L. (Eds.), Parallel Sin. 37 (12), 1–8.
Distributed Processing. MIT Press, Cambridge, MA, pp. 318–362. Zaefizadeh, M., Khayatnezhad, M., Gholamin, R., 2011. Comparison of multiple
Samadianfard, S., Nazemi, A.H., Ashraf Sadraddini, A., 2014. M5 model tree and linear regressions (MLR) and artificial neural network (ANN) in predicting the
gene expression programming based modeling of sandy soil water movement yield using its components in the Hulless Barley. Am. Eur. J. Agric. Environ. Sci.
under surface drip irrigation. Agric. Sci. Dev. 3 (5), 178–190. 10 (1), 60–64.
SAS, 2010. Statistical Analysis Software. Institute Inc. and World Programming
Limited, England and Wales High Court (Chancery Division).
Singh, T.N., Kanchan, R., Verma, A.K., Singh, S., 2003. An intelligent approach for
prediction of triaxial properties using unconfined uniaxial strength. Mining
Eng. J. 5, 12–16.

Vous aimerez peut-être aussi