Vous êtes sur la page 1sur 5

Scientia Horticulturae 181 (2015) 108–112

Contents lists available at ScienceDirect

Scientia Horticulturae
journal homepage: www.elsevier.com/locate/scihorti

Application of Artificial Neural Networks to predict the final fruit


weight and random forest to select important variables in native
population of melon (Cucumis melo. Pahlavan)
Mohammad Reza Naroui Rad ∗ , Shirali koohkan, Hamid Reza Fanaei,
Mohammad Reza Pahlavan Rad
Agriculture and Natural Resources Research Center of Sistan, Iran

a r t i c l e i n f o a b s t r a c t

Article history: The estimation of the relation between the inconstant factors can be highly helpful for calculating the
Received 28 June 2014 amount of variation of a particular character with respect to others. This paper aims to study the effects
Received in revised form 11 October 2014 of different agronomic and phenologic factors on the total mass of melon fruit produced. The agronomic
Accepted 14 October 2014
and phenologic factors which were considered during the study included, plant length, fruit weight, fruit
length, fruit width, number of fruits per each plant, number of days to flowering, number of days to
Keywords:
maturity, number of days to fruit formation, fruit cavity diameter and flesh diameter were the other
Melon
characters under study. During the study, every plant was taken as a self-sustaining unit. The study
Artificial Neural Network (ANN)
Fruit weight
explains a procedure to foretell the yield of melon by applying the Artificial Neural Networks or ANNs
Model as a displaying instrument. In the study the accession Firoozi was calculated with high accuracy and
Sistan efficacy (R2 = 87%, EMP = 2.21 and MSD = 1.66). RF yield variable importance measures for each candidate
Random forest predictor and in this study flesh diameter examined as effective variable in identifying the true predictor
among the candidate predictors.
© 2014 Elsevier B.V. All rights reserved.

1. Introduction suggested due to their superior class and taste (Naroui Rad et al.,
2010). The process of estimation of the newly formed genotypes
The melon production in Sistan suffers from a number of issues. included different features during the production cycle. Features
First of all there is a scarcity of commercially available varieties of like, height of the plant, weight, length and diameter of the fruit,
melons that can serve to the customer requirements such as proper number of fruits produced by each plant, and quick ripening were
size of the fruit; low capacity to fight against pests, infections, considered to select the melon plants with superior quality; and
and drought as well as different variations in different ecosys- these plants with superior traits were suggested to cultivate in the
tems makes the other reasons. Moreover, in Sistan, melon surpasses area(Nasrabadi et al., 2012). The proper information about these
the popularity of other fruit. To solve these issues, plant breeders features of the plant and how they are related with the final quality
can employ techniques for genetical improvement of the melon of the fruit produced can be helpful for the farmers to take cru-
plants and evaluate the new varieties against the older ones. As cial decisions like financing, in advance (Naroui Rad et al., 2010).
per the reports of 2013, Sistan, located in the south east of Iran, For the purpose of managing production and statistical estima-
is a huge producer of melons with 6000 ha of land allotted alone tions application of different modeling techniques to forecast the
for melon cultivation. After the genetical development programs expected harvest is essential, and it can be done with perfection
were initiated in Iran, the new genotypes were compared against only with the help of advanced calculations of the method of pro-
the genotypes of the melon plants from different other areas of duction and perfect valuation of different factors involved. The
Iran (Naroui Rad et al., 2010). Depending on the outcomes of the latest era has introduced the advanced Artificial Neural Network
study the native melon population of the Firoozi in Sistan was tools for solving critical issues. ANNs work depending on direct
inter-relations of different elements that are capable to point out
the connection between entrance and exit signals depending on the
∗ Corresponding author. Tel.: +98 542 2225026; fax: +98 542 2226328. provided patterns. These elements are marked as neurons of the
E-mail address: narouirad@gmail.com (M.R. Naroui Rad). ANNs. The advantage of this advanced method is that it supports

http://dx.doi.org/10.1016/j.scienta.2014.10.025
0304-4238/© 2014 Elsevier B.V. All rights reserved.
M.R. Naroui Rad et al. / Scientia Horticulturae 181 (2015) 108–112 109

uncomplicated simulations and do not need any type of analyti- 2.2. Evaluated traits
cal formulation. A number of researchers have used this technique
in different aspects of yield management and cultivation particu- The evaluations were done during the maturing of the fruits. The
larly for predicting the total yield (Bala et al., 2005; Movagharnejad traits calculated during the study include:
and Nikzad, 2007; Zhang et al., 2007). ANNs are considered as one
of the most efficient, precise and extensively used techniques for
(a) plant length,
data mining and predicting. Studies have proved that a network
(b) fruit weight,
has the capacity to estimate any function to the expected level of
(c) fruit length,
precision (Solaimany-Aminabad et al., 2013). The high importance
(d) fruit width,
placed on Neural Network at the recent times can be attributed
(e) number of fruits per plant,
to their non-linear models that can locate the past and future val-
(f) number of days to flowering,
ues in an input–output connection (Zhang et al., 2012). As clearly
(g) number of days to maturity,
mentioned by Hoogenboom (2000), these models are capable to
(h) number of days to fruit formation,
learn relationships between data, independently, which makes it
(i) fruit cavity diameter,
much suitable for cases where mathematical modeling cannot be
(j) flesh diameter.
performed. As reported by Jiang et al. (2004), the Artificial Neu-
ral Network Models trained with the back-propagation algorithm
can be very appropriate for predicting the harvest of the winter 2.3. Artificial Neural Networks (ANNs)
wheat by applying the information of remote sensing. Uno et al.
(2005) reported about using the Artificial Neural Networks and Artificial Neural Networks are created by direct relationships
a number of statistical methods that considered different factors between elements known as neurons which are capable to point
of the plant for predicting the net harvest of maize; and Higgins out the connection between entrance and exit signals in particular
et al. (2010) were found that the ANNs model was more accu- forms, just like the functioning of the neural system in the human
rate. The proposal of Random Forests a method for classification body (Haykin, 2008). Many ANNs models with several layers were
or regression based on the repeated growing of trees through the executed with the help of Satistica version 10 and Excel to find out
introduction of a random perturbation, tries to manage these sit- the one with most accurate predications for weight of the melon
uations averaging the outcome of a great number of models fitted fruit. The model was made with three layers, the first of which
to the same dataset. As a sub product of this technique, the iden- included all the traits except the weight of the fruit. The second
tification of variables which are important in a great number of layer, was a hidden layer that included 5 neurons; and the out-
models provides suggestions (Breiman, 2001). Several studies have put layer had only one neuron to conclude the weight of the fruit
found by researchers about correlation and regression analysis. In body.
earlier report, Feyzian et al. (2009) reported, it seems that increas- The number of hidden layers and hidden units that might be
ing crown diameter leads to an increase in the power of uptaking perfectly suitable for ANNs depends on a number of factors, such
minerals and, therefore, to increases in fruit weight in melon also as, the level of complexity of its structure, total number of input
by regression analysis, total yield was affected by fruit weight. In as well as output units, number of samples used in training, the
correlation studies with melon, yield per plant has been reported extent of noise in the sample set, and the algorithm used for training
to be positively correlated with the number of fruits, average fruit (Movagharnejad and Nikzad, 2007; Erzin et al., 2008; Panda et al.,
weight, number of nodes on the main stem, stem length, internode 2010; Zhang et al., 2011). In most of the cases, one or two hidden
length, and fruit shape index and most of plant breeders interested layers are highly helpful (Erzin et al., 2008).
to know which trait could be a best predictor for plant yield (Taha
et al., 2003). Hence, the objective of this paper is not only to come
up with a method to apply ANNs to forecast the fruit weight of 2.4. Running of the model
melon but also using random forest approach to know the best
predictor. The back-propagation algorithm has been used in training the
ANNs models used in this research work. This algorithm has a
multi-layer network which performs weight tuning as per the sig-
2. Materials and methods moid function like the delta rule. The back propagation algorithm
is basically a feed-forward process for training the ANNs with sev-
2.1. Experimental site and seeding eral layers. The steps involved in the training include: (a) input of
vectors into the network, (b) estimation of the output from the net-
The study was conducted in the Agriculture and Natural work, (c) the error which is calculated from the difference between
Resource Research Station of Zahak, situated in Sistan and Baluchis- the input and output vector values is fed back to the network, (d)
tan, at the south-east of Iran. Located at 61◦ 41 South latitude finally the weight vectors are changed depending on the algorithm.
and 30◦ 54 longitude, the center enjoys a yearly precipitation of The algorithm needs a network based architecture differentiated in
59 mm, and the average temperature of the place is 23 ◦ C. The a number of layers completely linked with the next. A basic struc-
area resides at 482 m height from the sea level. The study was ture of network can include single input layer, one hidden layer and
basically an examination for improvement of the melon produce an output layer. The primary objective of the method is to famil-
which was conducted on the native melon accessions (Firoozi) iarize the weights according to the estimated errors in-presence of
selected by the farmers for sale over the other varieties due to input configurations. The method is employed in a backward direc-
their enhanced quality. The study included total 116 melon plants tion, that is, from the output layer to the input layer. The first 70
that were distributed in 5 rows with 33 plants in each of the percent of the data set was taken as a training set and the next 30
rows planted at a distance of 2 m × 0.5 m from each other. In the percent as a testing set for ANNs. The entries were first standardized
study, each plant was taken as a separate unit, and the analyses with 0.00–1.00 values and as the training advanced, the weights
covered two production cycles for each of the plants. The plants were fine tuned to minimize the degree of error in the outcome.
were watered through a traditional channel system after every 10 For selecting the most accurate network a number of parameters,
days. such as R2 : the coefficient of determination, MPE: mean prediction
110 M.R. Naroui Rad et al. / Scientia Horticulturae 181 (2015) 108–112

Table 1 Table 2
Descriptive statistics on 10 agronomic traits for the native melon population. Statistical parameters for ANNs.

Trait N Mean Min Max Standard error Network structure 9-1-1 9-3-1 9-5-1 9-7-1 9-9-1
Error tolerance 0.01 0.01 0.01 0.01 0.01
Number of days to 116 48.0976 33 58 0.59141
Number of times 300 300 300 300 300
flowering
Learning rate 0.6 0.6 0.6 0.6 0.6
Number of days to fruit 116 54.0244 33 58 0.69711
Moment rate 0.6 0.6 0.6 0.6 0.6
formation
R2 0.34 0.48 0.87 0.69 0.73
Number of fruits per 116 2.3171 1 4 0.12316
MPE (%) 3.34 2.54 2.21 2.67 2.56
plant
MSD 1.98 1.86 1.66 1.91 1.95
Fruit weight 116 608.8537 319 980 24.70881
Fruit length 116 21.9024 15 38 0.78383 R2 : Coefficient of determination.
Fruit width 116 14.5366 5 21 0.43491 MPE: Mean prediction error.
Fruit cavity diameter 116 7.7561 1 15 0.76571 MSD: Mean square deviation.
Flesh diameter 116 2.0244 1.2 3.7 0.09733
Plant length 116 105.0244 72 119 1.49613
Number of days to 116 75.2439 71 80 0.32718 For the yield data presented in Table 1, the model with 9:5:1 net-
maturity work structure is concluded as the optimal. Table 2 presents the
Min: minimum; Max: maximum. parameters for the ANNs examined which include, the number of
nodes in each secret layer of the network structure, the extent of
error tolerance as expected from the test, number of occurrence of
error and MSD: mean square deviation was considered. These three
the minimum error, mean prediction error, coefficient of determi-
parameters are represented as below:
nation and the mean square error. For estimating the coefficient of

(Xobs − Xpredict ) 100 determination which is represented by R2 , the MPE or mean pre-
MPE (%) = × diction error and the MSD or mean square deviation of the reply
(xobs ) n
 2
the estimated values by the ANNs models were compared against
(Xobs − Xpredict ) the real data. As presented in Table 2, the best statistical parame-
MSD (%) =
n ters were achieved by the 9-5-1 network structure. This network
 sq(reg)  includes 9 input data in the first layer, 5 neurons in the secret or
R2 = 1 − hidden layer and only 1 output neuron which represents the fruit
sq(T )
weight. The independent variables present in the model explain
where,  = the sum of i − n, Xobs = fruit weight as noted in the 87% result for the fruit weight and also shows the minimum
the field, Xpredict = fruit weight as estimate by the network, error value. Alteration of the number of neurons of the hidden
R2 = coefficient of determination, sq(reg) = the sum of square regres- layer or reducing the value of R2 amplifies the error. The accu-
sion, SST = the total sum of square, SSR = the total sum of squared racy of the forecast is lessened and the error is increased with the
residuals. variation in the number of neurons present in the hidden layer.
Basically it is the hidden layer which makes the model capable of
2.5. Random forest (variable importance) producing high-level of statistical data. The relation between the
noted and the forecasted value has been presented in Fig. 1. The
Random forests is a very efficient algorithm, based on model picture shows that the curves generated by the Artificial Neural
aggregation ideas, for both classification and regression problems, Network model and the ones formed by the actual observed value
introduced by Breiman (2001). The quantification of the variable are of the same pattern and shape which confirms that the ANN
importance is a crucial issue not only for ranking the variables has modeled the data successfully by using a dependable structure
before a stepwise estimation model but also to interpret data which had 9 experimental nodes. The result makes it clear, that by
and understand underlying phenomenon in many applied prob- using this ANN model the farmers can put the data in a computer
lems. Random Forest package (Liaw and Wiener, 2002) within the and predict the cost-effectiveness of the yield in advance, which
R package was implemented. Variable importance measurement will make it possible for the cultivator to opt for financial helps
was carried out in the Random Forest R package. Random forest with full assurance of in time refund (Soares et al., 2013). In agri-
estimates covariate importance by permuting the values of each culture, maximizing the harvest while minimizing the expenses is
covariate in the out-of-bag (OOB) sample (thereby destroying the surely the prime goal; and by advance revealing and proper man-
information content of each covariate) and reclassifying OOB sam- agement of the issues related with the harvest of the crop, can
ples using the permuted variable. The change in OOB error is then improve the yield and the gain for the cultivator. Extreme atmo-
an indication of the importance of that covariate. Covariates with spheric conditions can have serious effect on the weather of the
a comparatively large increase in OOB error are more important area affecting the yield. The model can also be used to minimize
(Pierce et al., 2012). the amount of loss in case of unfavorable weather conditions. At
the same time, the same prediction ways can be applied to find
3. Results and discussion out the maximum crop yield capacity during favorable conditions
(Snehal and Sandeep, 2014). Maximum studies performed on mel-
With regard to the population the results of phenotypic statis- ons have stressed on classification and estimation of the genotypic
tics, including mean, minimum, maximum, and standard error of behaviors of different varieties including hybrids of melons by
mean are shown in Table 1. Fig. 1 explains the structure of the ANNs applying different phenotypic traits related to the documentation
formed after training with the back-propagation algorithm for fore- and assortment of superior plants (Chamnan and Kasem, 2006;
casting the fruit weight of melon. The study was conducted over Emeka et al., 2013; Snehal and Sandeep, 2014). With the help of
two flowering cycles of the plants; data collected from the first correlation, regression and path analysis studies the factors which
cycle were applied to form the model, whereas the second set of are directly related to the fruit weight of melon have been pointed
data were preserved for evaluating the forecasting accuracy of the out. However, as these studies cannot forecast the yield of melon, so
model. Each of the tasks was initiated with different weights till the simple relationship provided between a number of traits and
the lowest range of average error was reached. The model with the the fruit weight might or might not be really significant in prac-
minimum average standard error was nominated for the process. tical application. However, the ANNs offer a much advanced way
M.R. Naroui Rad et al. / Scientia Horticulturae 181 (2015) 108–112 111

Train data: Fruit weight calculated by MLP


1200.00

1000.00

800.00

Weight(gr) 600.00 observed


predicted
400.00

200.00

0.00
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 Plant
Test data: Fruit weight calculated by MLP
1200
observed
1000 predicted
Weight(gr)

800

600

400

200

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Plant
Fig. 1. Graphical representation of the values recorded series (blue) and the predicted.by the neural network (red) for the weight of the fruit in melon. MLP: multilayer
perceptrons. (For interpretation of reference to color in this figure legend, the reader is referred to the web version of this article.)

than the traditional correlation methods, as it enables a cultivator to research paper suggests a process to forecast the fruit weight of
provide inputs in a computer or calculating device and evaluate the melon plants by applying the Artificial Neural Network models. A
expected fruit weight even before ripening of the fruit (Soares et al., great benefit of forecasting the total yield in advance is that, in this
2013). The information gathered by the previous researches can be way the cultivator can estimate the total yield at an early stage
used to regulate the factors which are crucial to melon production. of the cultivation rather than waiting for the fruits to grow and
By forecasting the yield from a melon cultivation, the cultivator ripe. The ANNs model applied in this case predicted the final fruit
can calculate the cost-effectiveness of growing the crop. This can weight of melon with high accuracy, and the network which came
inspire further studies on the topic to find out the factors which up with the most precised outcome had a 9-5-1 structure. The vari-
effects the final yield to the most. The time taken to produce a able importance measure of the RF model explain the effects of the
new cultivar, often work as a hurdle for crop improvement; so any traits on the estimation of fruit weight (Fig. 2). Regarding the total
process that can reduce the time taken, can be highly helpful. This weight of melon fruit, the importance of flesh diameter could be

Importance plot
1.2

1.0

0.8
Importance

0.6

0.4

0.2

0.0
Fruit cavilty diameter
Flesh diameter

Number of days to flowering


Fruit length

Fruit width

Number of days to fruit formation

Plant length

Number of fruits per Plant


Number of days to maturity

Fig. 2. The variable importance obtained by RF. RF: random forest.


112 M.R. Naroui Rad et al. / Scientia Horticulturae 181 (2015) 108–112

related to the fact that final fruit weight is highly affected by this Liaw, A., Wiener, M., 2002. Classification and regression by random forest. R News
trait in this study. 2, 18–22.
Movagharnejad, K., Nikzad, M., 2007. Modelling of tomato drying using artificial
neural network. Comp. Elect. Agric. 59, 78–85.
Acknowledgements Naroui Rad, M., Allahdo, M., Fanaei, H., 2010. Evaluation of agronomic traits in native
population of melon. Iran. J. Hortic. 40, 53–59.
We would like to thank organization of Jahad-e-Agriculture for Nasrabadi, N., Nemati, H., Sobhani, A., Sharif, M., 2012. Study on morpho-
logic variation of different Iranian melon cultivars. Afr. J. Agric. Res. 7,
its helpful collaboration and the authors express their gratitude to 2764–2769.
the Agriculture and Natural Resources Research Center of sistan for Panda, S., Ames, D., Panigrahi, S., 2010. Application of vegetation indices for agri-
preparing the place and facility. cultural crop yield prediction using neural network techniques. Remote Sens. 2,
673–696.
Pierce, A., Faris, C., Taylor, A., 2012. Use of random forests for modeling and mapping
References forest canopy fuels for fire behavior analysis in Lassen Volcanic National Park,
California, USA. Forest Ecol. Manage. 279, 77–89.
Bala, B., Ashraf, M., Uddin, M., Janjai, S., 2005. Experimental and neural network Snehal, D., Sandeep, V., 2014. Agricultural crop yield prediction using artificial neu-
prediction of the performance of a solar tunnel drier for a solar drying jack fruit ral network approach. Int. J. Innov. Res. Elect. Electr. Instrum. Cont. Eng. 2,
bulbs and leather. J. Food Proc. Eng. 28, 552–566. 683–686.
Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32. Soares, J., Pasqual, M., Lacerda, W., Silva, S., Donato, S., 2013. Utilization of artificial
Chamnan, L., Kasem, P., 2006. Heritability, heterosis and correlations of fruit char- neural networks in the prediction of the bunches’ weight in banana plants. Sci.
acters and yield in Thai Slicing Melon (Cucumis melo L. var. conomon Makino). Hortic. 155, 24–29.
Kasetsart J. (Nat. Sci.) 40, 20–25. Solaimany-Aminabad, M., Maleki, A., Mahdi, H., 2013. Application of artificial neural
Emeka, N., Justin, O., Agbo, C., 2013. Assessment of phenotypic variability, herita- network (ANN) for the prediction of water treatment plant influent character-
ble components and character association in egusi melon American-Eurasian. J. istics. J. Adv. Environ. Health Res. 1, 1–12.
Agric. Environ. Sci. 13, 961–966. Taha, M., Omara, K., El Jack, A., 2003. Correlation among growth, yield and quality
Erzin, Y., Rao, H., Singh, D., 2008. Artificial neural network models for predicting soil characters in Cucumis melo L. Cucurbit Genet. Coop. Rep. 26, 9–11.
thermal resistivity. Int. J. Ther. Sci. Algor. 47, 1347–1358. Uno, Y., Prasher, S., Lacroix, R., Goel, P., Karimi, Y., Viau, A., Patel, R., 2005. Artificial
Feyzian, E., Dehghani, H., Rezai, A., Jalali, M., 2009. Correlation and sequential path neural networks to predict corn yield from compact airborne spectrographic
model for some yield-related traits in melon (Cucumis melo L.). J. Agric. Sci. imager data. Comp. Elect. Agric. 47, 149–161.
Technol. 11, 341–353. Zhang, H., Hu, H., Zhang, X., Zhu, L., Zheng, K., Jin, Q., Zeng, F., 2011. Estimation of
Haykin, S., 2008. Neural Computing, 2nd ed. Prentice Hall, Princeton, 842 pp. rice neck blasts severity using spectral reflectance based on BP-neural network.
Higgins, A., Prestwidge Di Tirling, S.D., Yost, J., 2010. Forecasting maturity of green Acta Physiol. Plant. 33, 2461–2466.
peas: an application of neural networks. Comp. Elect. Agric. 70, 151–156. Zhang, H., Song, T., Wang, K., Wang, G., Hu, H., Zeng, F., 2012. Prediction of crude
Hoogenboom, G., 2000. Contribution of agrometeorology to the Simulation of Crop protein content in rice grain with canopy spectral reflectance. Plant Soil Environ.
Production and Its Applications. Agric. For. Meteorol. 103, 137–157. 58, 514–520.
Jiang, S., Jiang, D., Yang, X., Clinton, N., Wang, N., 2004. An artificial neural network Zhang, W., Bai, X., Liu, G., 2007. Neural network modeling of ecosystems: a case
model for estimating crop yields using remotely sensed information. J. Remote study on cabbage growth system. Ecol. Model 201, 3–4.
Sens. 25, 1723–1732.

Vous aimerez peut-être aussi