Académique Documents
Professionnel Documents
Culture Documents
Co-author’s name Violeta Bartolome, Alexander Cañeda, Alaine Gulles, Rose Imee
Zhella Morantte, Leilani Nora, Angel Manica Raquel, Christoffer
Edd Relente, Darwin Talay and Guoyou
Designation: Senior Associate Scientist - Biometrics, Senior Specialist -
Software Engineering, Assistant Scientist, Specialist, Assistant
Scientist, Programmer, Programmer, Programmer and Senior
Scientist - Breeding Informatics Specialist
Affiliation: International Rice Research Institute
Address: College, Los Baños, Laguna
Tel. no.: 5362701 loc 2238
E-mail: v.bartolome@irri.org, a.caneda@irri.org, a.gulles@irri.org,
r.morantte@irri.org, l.nora@irri.org, a.raquel@irri.org,
c.relente@irri.org, d.talay@irri.org and g.ye@irri.org
PBTools: Software for Plant Breeders
Nellwyn Sales, Violeta Bartolome, Alexander Cañeda, Darwin Talay, Alaine Gulles,
Rose Imee Zhella Morantte, Leilani Nora, Angel Manica Raquel,
Christoffer Edd Relente and Guoyou Ye
Abstract
Data from plant breeding trials need to be analyzed properly, with greater speed
to support selection decision making. Genetic information should also be derived from
breeding trials to determine more efficient breeding strategies. Although general
statistical software can be used to analyze breeding trials, many practical breeders are
seeking easy-to-use analytical tools.
Introduction
Plant Breeding Tools (PBTools) is a free statistical application created using the
Eclipse Rich Client Platform (RCP), a platform for building and deploying rich client
applications, and R language. It has been developed to assist plant breeders in the
design and analysis of data. It has an easy to navigate GUI that does not require users
to have programming skills to perform data manipulation and analysis.
Its current version provides modules for data management in spreadsheet view,
randomization for commonly used experimental designs, single- and multiple-
environment analysis, QTL analysis, selection index, commonly used mating designs,
and generation mean analysis.
An introduction of the features of PBTools is provided in this paper.
PBTools Environment
Main Window
The PBTools main window (Figure 1) has a menu bar which houses five items:
Project, Data, Analysis, Randomization, and Help. The Project menu contains functions
for creating and managing projects. The Data menu contains functions for reading,
managing and manipulating datasets. The Analysis Menu contains functions to perform
statistical analysis. The Randomization Menu contains functions for generating random
assignment of factor levels for commonly used experimental designs in plant breeding.
Finally, the Help Menu is used to access PBTools’ user’s manual and some information
about the software.
Figure 2 presents the submenu items under the Data, Analysis, and
Randomization menus.
2
Figure 2. Submenu items for the Data, Analysis, and Randomization menus.
The PBTools main window is divided into two panels: the Project Explorer panel
and the Editor panel. The Project Explorer panel functions as a file manager of the active
project, where names of data files and analysis results files are displayed in tree form,
while the Editor panel serves as viewer for selected data (by means of the Data Viewer
tab) and/or results of analysis (via the Results Viewer tab).
3
Figure 4. Results Viewer (Output and Graph Tabs) in the Editor Panel of PBTools.
Handling Data
PBTools uses comma-separated values (csv) format for data files. Data files are
created outside of PBTools and imported into the Data folder of the active project. Data
files (.rda or .txt) may also be imported into PBTools but they will be automatically
converted into .csv format. To represent missing observations, the user can use “NA”,
period, blank or space.
A selected file is displayed in a Data Viewer tab in the Editor panel. Several Data
Viewers can be seen simultaneously inside the Editor Panel.
Data values can be edited in the Data Viewer. Data manipulation can also be
performed using options available in a toolbar in the Data Viewer (Figure 5) or the
submenu items under the Data menu. These options are: inserting row(s)/column(s)
from the data, deleting row(s)/column(s) from the data, creating a new variable, editing
variable information, sorting, aggregating, reshaping, merging, and appending data sets
and creating data subset.
4
Randomization for Some Experimental Designs
A field book in csv format is created (Figure 7) and the layout (saved in a text file)
is displayed in the Data Viewer (Figure 8).
5
Figure 7. Sample Field Book after the Randomization.
6
Analysis
Prior to doing analysis in PBTools, a data set must first be selected and opened
in the Data Viewer. When the needed data set is active, the desired submenu option can
be selected from the Analysis menu.
Single-environment analysis
Analyses using mixed models for the following designs are available in PBTools:
Randomized Complete Block (RCB), Augmented RCB, Augmented Latin Square, Alpha-
Lattice and Row-Column. The user should specify in the dialog box some required
information and preferred output or graphs to be generated. If the Environment field is
specified, the analysis will be done per environment level. Otherwise, the data will be
treated as if it came from one environment. The user also has the option to regard
genotype as fixed or random factor while the remaining terms in the model are regarded
as random. As illustration, Figures 9 to 11 show the filled-up tabs for a sample analysis
in RCB.
7
Figure 10. Options Tab of Single-Environment Analysis Dialog Box.
8
After providing necessary information and desired options in the dialog box, the
text and graph outputs will be displayed separately in tabs in the Editor panel. Additional
csv files containing the computed residuals and summary statistics (if genotype is fixed)
or predicted means (if genotype is random) are saved inside the results folder. These
generated files can be accessed through the Project Explorer.
SINGLE-ENVIRONMENT ANALYSIS
==============================
GENOTYPE AS: Fixed
==============================
------------------------------
RESPONSE VARIABLE: YIELD
------------------------------
DESCRIPTIVE STATISTICS:
------------------------------
ANALYSIS FOR: ENV = 1
------------------------------
DATA SUMMARY:
9
VARIANCE COMPONENTS TABLE:
10
45 IRRI161 4840.0 348.1689
46 IRRI162 4863.5 348.1689
47 IRRI163 4328.5 348.1689
48 IRRI164 4943.0 348.1689
49 IRRI165 4058.5 348.1689
50 IRRI168 5542.5 348.1689
Estimate
Minimum 489.3030
Average 489.3030
Maximum 489.3030
==============================
GENOTYPE AS: Random
==============================
------------------------------
RESPONSE VARIABLE: YIELD
------------------------------
DESCRIPTIVE STATISTICS:
------------------------------
ANALYSIS FOR: ENV = 1
------------------------------
11
DATA SUMMARY:
PREDICTED MEANS:
ENTRY Means
1 IRRI102 4177.854
2 IRRI103 3964.718
3 IRRI104 4348.815
4 IRRI105 4240.740
5 IRRI106 4164.297
6 IRRI108 4449.734
7 IRRI109 4620.695
8 IRRI112 4429.023
9 IRRI113 4703.539
10 IRRI115 4293.083
11 IRRI116 5012.700
12 IRRI117 4525.800
13 IRRI118 4891.069
14 IRRI119 5248.053
15 IRRI120 4860.944
16 IRRI122 4682.452
17 IRRI123 4994.248
18 IRRI124 3837.062
19 IRRI125 3514.722
20 IRRI127 3577.985
21 IRRI128 3314.012
22 IRRI133 3633.340
23 IRRI134 3918.777
24 IRRI135 4654.209
12
25 IRRI136 4585.298
26 IRRI139 3694.720
27 IRRI140 4225.301
28 IRRI141 4850.023
29 IRRI143 4398.898
30 IRRI145 4025.345
31 IRRI146 5504.871
32 IRRI147 4372.915
33 IRRI148 4332.622
34 IRRI149 4396.638
35 IRRI150 4906.132
36 IRRI151 4126.264
37 IRRI152 3585.516
38 IRRI154 4938.893
39 IRRI155 3711.289
40 IRRI156 4985.587
41 IRRI157 3443.174
42 IRRI161 4728.016
43 IRRI162 4745.715
44 IRRI163 4342.789
45 IRRI164 4805.589
46 IRRI165 4139.444
47 IRRI168 5257.091
CHECK/CONTROL LSMEANS:
HERITABILITY:
0.75
==============================
GENOTYPIC CORRELATIONS:
Site: 1
YIELD PLTHGT
YIELD 0.4687
PLTHGT 0.4687
PHENOTYPIC CORRELATIONS:
Site: 1
YIELD PLTHGT
YIELD 0.4342
PLTHGT 0.4342
==============================
Generated graphs can be viewed by clicking the Graph Tab of the displayed
results folder. Sample generated graphs are shown in Figure 13. The distribution of the
13
values of the response variable can be assessed by looking at the boxplot and
histogram while for the distribution of the residuals, diagnostic plots and heatmap are
available.
14
Multi-environment analysis
15
Figure 15. Options Tab of Multi-Environment Analysis (One-Stage)
Dialog Box.
16
After processing the analysis, a text output is displayed in the Editor panel.
Figure 17 shows the sample partial result of the analysis. If genotype is regarded as
fixed, the output of the analysis includes the following: data summary; some descriptive
statistics of the response variable; estimates of the variance components of the model;
test for the significance of genotypic effect wherein the denominator degrees of freedom
in F test is computed according to a general Satterthwaite approximation; test for the
significance of environment and genotypic × environment effects using -2 loglikelihood
ratio test; genotypic × environment means; least-square means of the genotypes;
summary statistics of the standard errors of the difference; pairwise mean comparison
using Dunnett's procedure if comparing treatments with a control and HSD if performing
all pairwise mean comparisons; stability analysis using Finlay-Wilkinson model and
Shukla’s model; and, additive main effects and multiplicative interaction (AMMI) analysis.
If genotype is regarded as random, the output of the analysis includes the following: data
summary; some descriptive statistics of the response variable; estimates of the variance
components of the model; test for the significance of genotypic, environment and
genotypic × environment effects using -2 loglikelihood ratio test; genotypic ×
environment means; predicted genotype means derived using the Best Linear Unbiased
Prediction (BLUP); and, estimate of heritability.
==============================
GENOTYPE AS: Fixed
==============================
------------------------------
RESPONSE VARIABLE: Yield
------------------------------
DATA SUMMARY:
DESCRIPTIVE STATISTICS:
17
TESTING FOR THE SIGNIFICANCE OF GENOTYPIC EFFECT:
18
11 GEN5 1246.419 334.2864
12 GEN6 1606.112 334.2864
13 GEN7 1523.806 334.2864
14 GEN8 1871.903 334.2864
15 GEN9 1352.776 334.2864
Estimate
Minimum 355.8134
Average 355.8134
Maximum 355.8134
19
AMMI ANALYSIS:
Percentage of Total Variation Accounted for by the Principal Components:
==============================
GENOTYPE AS: Random
==============================
------------------------------
RESPONSE VARIABLE: Yield
------------------------------
DATA SUMMARY:
DESCRIPTIVE STATISTICS:
20
TESTING FOR THE SIGNIFICANCE OF ENVIRONMENT EFFECT USING -2 LOGLIKELIHOOD RATIO
TEST:
Genotype Mean
1 GEN1 1709.135
2 GEN10 1722.349
3 GEN11 1726.594
4 GEN12 1731.851
5 GEN13 1728.811
6 GEN14 1737.956
7 GEN15 1723.421
8 GEN2 1723.271
9 GEN3 1714.106
10 GEN4 1711.594
11 GEN5 1698.069
12 GEN6 1713.706
13 GEN7 1710.128
14 GEN8 1725.261
15 GEN9 1702.693
HERITABILITY:
0.05
21
Sample generated graphs are shown in Figure 18. Boxplot and histogram are
available for the evaluation of the distribution of the values of the response variable while
diagnostic plots for the distribution of the residuals. If AMMI analysis is requested, biplots
are generated to aid in the assessment of the interaction between genotype and
environment.
22
Quantitative Trait Locus (QTL) Analysis
23
Figure 21. Options Tab of QTL Analysis Dialog Box.
A text file is generated after the analysis. Figure 22 shows the sample partial text
output which contains the following: results of the single-environment analysis; LOD
scores of all the markers; and statistics on the selected/significant markers.
SINGLE-ENVIRONMENT ANALYSIS
------------------------------
RESPONSE VARIABLE: HEIGHT
------------------------------
DESCRIPTIVE STATISTICS:
------------------------------
ANALYSIS FOR: ENV = 1
------------------------------
DATA SUMMARY:
24
VARIANCE COMPONENTS TABLE:
Estimate
Minimum 14.2064
Average 14.2064
Maximum 14.2064
==============================
QTL ANALYSIS
METHOD: CIM
------------------------------
RESPONSE VARIABLE: HEIGHT
------------------------------
------------------------------
ANALYSIS FOR: ENV = 1
------------------------------
25
8 1_loc30 1 30 0.126238502
9 M_0032 1 31 0.124952527
10 1_loc40 1 40 0.675666296
11 M_0042 1 41 7.910819471
12 1_loc50 1 50 9.507940316
13 M_0053 1 52 8.808176807
14 M_0056 1 55 7.075115150
15 M_0058 1 57 6.531686512
16 1_loc60 1 60 0.023835142
17 M_0062 1 61 0.010078575
18 M_0063 1 62 0.497989434
19 M_0066 1 65 0.511330577
20 M_0069 1 68 1.022791170
21 1_loc70 1 70 0.849382305
22 M_0076 1 75 0.286639058
23 M_0081 1 80 0.157442187
24 M_0083 1 82 1.915089343
25 M_0085 1 84 2.384399583
.
.
.
284 M_1087 7 150 0.154691409
Sample generated graphs are shown in Figure 23. These include heatmap of
LOD scores and recombination fractions, plot of pairwise genotypic differences, marker
map, visualization of genotypes, plot of missing genotypes and QTL maps.
26
27
Figure 23. Sample graph outputs of QTL analysis.
28
Figure 25 shows the sample output of the analysis. The output includes genetic
and phenotypic correlation matrices, molecular covariance matrix, statistics on the
values of the selection index and breeding values, characteristics of the selected
individuals, and values of the selection index for all individuals.
DESIGN: Lattice
MFL1 FFL1 EHT1 PHT1 GY1 MFL.2 FFL2 EHT2 PHT2 GY2
MFL1 1.00 0.88 0.19 -0.33 -0.62 0.96 0.88 0.23 -0.25 -0.36
FFL1 0.88 1.00 0.18 -0.20 -0.77 0.83 0.68 0.15 -0.28 -0.41
EHT1 0.19 0.18 1.00 0.75 -0.03 0.18 0.04 1.09 0.96 0.13
PHT1 -0.33 -0.20 0.75 1.00 0.32 -0.23 -0.28 0.75 1.09 0.28
GY1 -0.62 -0.77 -0.03 0.32 1.00 -0.59 -0.70 0.09 0.40 0.98
MFL.2 0.96 0.83 0.18 -0.23 -0.59 1.00 0.91 0.31 -0.23 -0.52
FFL2 0.88 0.68 0.04 -0.28 -0.70 0.91 1.00 0.27 -0.23 -0.61
EHT2 0.23 0.15 1.09 0.75 0.09 0.31 0.27 1.00 0.77 -0.08
PHT2 -0.25 -0.28 0.96 1.09 0.40 -0.23 -0.23 0.77 1.00 0.20
GY2 -0.36 -0.41 0.13 0.28 0.98 -0.52 -0.61 -0.08 0.20 1.00
MFL1 FFL1 EHT1 PHT1 GY1 MFL.2 FFL2 EHT2 PHT2 GY2
MFL1 1.00 0.71 0.01 -0.37 -0.47 0.63 0.56 0.10 -0.17 -0.29
FFL1 0.71 1.00 0.04 -0.26 -0.48 0.54 0.51 0.03 -0.20 -0.31
EHT1 0.01 0.04 1.00 0.80 0.07 0.13 0.06 0.71 0.51 0.06
PHT1 -0.37 -0.26 0.80 1.00 0.32 -0.14 -0.15 0.49 0.53 0.17
GY1 -0.47 -0.48 0.07 0.32 1.00 -0.35 -0.39 0.04 0.15 0.44
MFL.2 0.63 0.54 0.13 -0.14 -0.35 1.00 0.72 0.05 -0.31 -0.40
FFL2 0.56 0.51 0.06 -0.15 -0.39 0.72 1.00 0.05 -0.21 -0.45
EHT2 0.10 0.03 0.71 0.49 0.04 0.05 0.05 1.00 0.81 0.15
PHT2 -0.17 -0.20 0.51 0.53 0.15 -0.31 -0.21 0.81 1.00 0.34
GY2 -0.29 -0.31 0.06 0.17 0.44 -0.40 -0.45 0.15 0.34 1.00
MFL1 FFL1 EHT1 PHT1 GY1 MFL.2 FFL2 EHT2 PHT2 GY2
MFL1 1.00 0.33 0.69 0.63 0.11 0.43 -0.23 0.48 0.79 0.37
FFL1 0.33 1.00 0.46 0.37 0.02 0.24 0.22 0.16 0.23 -0.12
EHT1 0.69 0.46 1.00 0.48 0.23 0.51 -0.41 0.58 0.41 0.13
PHT1 0.63 0.37 0.48 1.00 -0.21 0.47 0.25 0.48 0.50 -0.10
GY1 0.11 0.02 0.23 -0.21 1.00 -0.04 -0.34 -0.01 0.27 -0.18
MFL.2 0.43 0.24 0.51 0.47 -0.04 1.00 0.15 0.73 0.31 0.04
FFL2 -0.23 0.22 -0.41 0.25 -0.34 0.15 1.00 -0.14 -0.04 -0.16
EHT2 0.48 0.16 0.58 0.48 -0.01 0.73 -0.14 1.00 0.32 0.01
PHT2 0.79 0.23 0.41 0.50 0.27 0.31 -0.04 0.32 1.00 -0.02
GY2 0.37 -0.12 0.13 -0.10 -0.18 0.04 -0.16 0.01 -0.02 1.00
29
COVARIANCE BETWEEN SELECTION INDEX AND BREEDING VALUE: 3.205508
VALUES OF THE TRAITS, SELECTION INDEX, MEANS, GAINS FOR THE 5% SELECTED
INDIVIDUALS
30
VALUES OF THE TRAITS AND THE SELECTION INDEX FOR ALL INDIVIDUALS
MFL1 FFL1 EHT1 PHT1 GY1 MFL.2 FFL2 EHT2 PHT2 GY2
Entry 1 102.21 100.25 71.45 123.75 42.45 99.29 98.95 68.51 117.47 16.87
Entry 2 104.88 104.42 100.22 148.82 28.72 99.60 100.36 106.75 161.75 140.00
Entry 3 98.97 100.44 80.17 154.36 77.78 97.93 96.82 66.75 133.75 116.00
.
.
.
(some rows are deleted)
Mating Designs
For North Carolina II, analysis can be done per environment level or across
environments. To illustrate analysis per environment level, a sample completed dialog
box is shown in Figure 26.
31
The results of the analysis as shown in Figure 27 includes data summary,
ANOVA table (assuming fixed model), estimates of the variance components and
estimates of the genetic variance components.
-----------------------------
RESPONSE VARIABLE: Y
-----------------------------
-----------------------------
ANALYSIS FOR: Env = A
-----------------------------
DATA SUMMARY:
Fixed Effects:
Estimate Std. Error t value
(Intercept) 55.6076 1.1379 48.8702
Random Effects:
Groups Variance Std. Deviation
Male:Female 2.3418 1.5303
Female 3.6767 1.9175
Male 5.9229 2.4337
Block 0.0219 0.1481
Residual 9.7736 3.1263
32
ESTIMATES OF GENETIC VARIANCE COMPONENTS:
Estimate
VA 9.599670
VD 2.341800
Narrow sense heritability(plot-mean based) 0.442075
Broad sense heritability(plot-mean based) 0.549917
Dominance Ratio 0.698492
Figure 27. Sample Text Output of North Carolina II Per Environment Analysis
-----------------------------
RESPONSE VARIABLE: Y
-----------------------------
DATA SUMMARY:
ANOVA TABLE:
33
LINEAR MIXED MODEL FIT BY RESTRICTED MAXIMUM LIKELIHOOD:
Fixed Effects:
Estimate Std. Error t value
(Intercept) 55.7587 0.7634 73.042
Random Effects:
Groups Variance Std. Deviation
Env:Male:Female 3.1506 1.7750
Male:Female 0.8681 0.9317
Env:Female 0.3985 0.6313
Env:Male 1.8441 1.3580
Female 1.7219 1.3122
Male 1.0554 1.0273
Env:Block 0.1792 0.4233
Env 0.0000 0.0000
Residual 10.5157 3.2428
Estimate
VA 2.777330
VAxE 2.242660
VD 0.868100
VDxE 3.150620
h2-narrow sense 0.142031
H2-broad sense 0.186425
Dominance Ratio 0.790653
34
Figure 29. Dialog Box for Diallel Analysis (Griffing Method 2).
Partial results are shown in Figure 30. This consists of data summary, test for the
significance of the crosses, test for the significance of GCA and SCA effects, GCA, SCA
and residual variance estimates, estimates of genetic variance components and
estimates of the GCA and SCA effects.
35
DATA FILE: E:/NSALES/pbtools workspace/SampleProject/Data/Diallel_M2.csv
-----------------------------
RESPONSE VARIABLE: Plant_height
-----------------------------
-----------------------------
ANALYSIS FOR: Env = Normal
-----------------------------
DATA SUMMARY:
MATRIX OF MEANS:
1 2 3 4 5 6 7
1 142.9000 148.3333 163.9000 152.9000 142.3667 160.6667 191.3333
2 129.6667 142.9000 143.9000 131.5667 143.5333 186.6667
3 131.5667 163.5667 136.9000 166.4333 189.7667
4 159.3333 149.8000 164.8000 200.2333
5 122.4333 138.9000 175.7667
6 146.1333 195.5667
7 157.5333
ANALYSIS OF VARIANCE:
36
ESTIMATES OF GENETIC VARIANCE COMPONENTS:
Estimate
VA 481.369471
VD 798.389175
h2-narrow sense 0.371865
H2-broad sense 0.988632
Dominance Ratio 1.821307
1 2 3 4 5 6 7
1 -0.6608 3.1454 10.8935 -7.5806 1.1861 3.7083 13.0157
2 -10.5571 -0.2102 -6.6843 0.2824 -3.5287 18.2454
3 -2.7386 5.1639 -2.2028 11.5528 13.5269
4 4.7354 3.2231 2.4454 16.5194
5 -14.5646 -4.1546 11.3528
6 1.2132 15.3750
7 22.5725
37
DATA FILE: E:/NSALES/pbtools workspace/SampleProject/Data/Diallel_M2.csv
-----------------------------
RESPONSE VARIABLE: Plant_height
-----------------------------
DATA SUMMARY:
ANOVA TABLE:
ANOVA TABLE:
MATRIX OF MEANS:
1 2 3 4 5 6 7
1 131.0000 133.8333 153.5500 140.7333 128.9000 154.4333 182.2833
2 116.8333 130.5500 133.8500 122.7167 129.0500 175.9000
3 122.0000 154.5000 131.2167 153.2167 177.7833
4 137.3333 138.5667 155.7833 195.2333
5 111.8833 132.5667 166.3333
6 143.0167 188.6167
7 155.1000
-------
REMARK: Raw dataset is balanced.
38
GENERAL COMBINING ABILITY EFFECTS, SPECIFIC COMBINING ABILITY EFFECTS (above
diagonal)
1 2 3 4 5 6 7
1 1.4884 11.7329 -6.4745 -1.7227 6.5181 12.5181
2 -0.5819 -2.6727 2.7792 -8.1801 16.8199
3 8.5051 1.8069 6.5144 9.2310
4 3.7662 3.6903 21.2903
5 -2.9412 8.9755
6 13.9662
7
GCA -1.6418 -12.3270 -2.8548 2.5360 -14.0492 3.2434 25.0934
Estimate
GCA 143.1889
SCA 191.9809
GCAxE 3.9448
SCAxE 9.5364
Error 42.0727
Estimate
VA 572.755727
VAxE 15.779056
VD 767.923608
VDxE 38.145415
h2-narrow sense 0.398667
H2-broad sense 0.933181
Dominance Ratio 1.637530
Future Direction
39