Académique Documents
Professionnel Documents
Culture Documents
User Guide
Version 4.5
+1 510-741-1000
+1 510-741-5811
lsg.techserv.us@bio-rad.com
www.bio-rad.com
Limitations on Use
InfoQuestFP software and this accompanying guide are
subject to the terms and conditions outlined in the license
agreement. The support, entitlement to upgrades, and the
right to use the software automatically terminate if the
user fails to comply with any of the statements of the
license agreement.
No part of this guide may be reproduced by any means
without prior written permission of Bio-Rad Laboratories,
Inc.
Acrobat is a trademark of Adobe, Inc. Access, Excel, FoxPro, Microsoft SQL Server, Windows, Windows 2000, Windows
NT, and Windows XP are trademarks of Microsoft, Inc. dBase is a registered trademark of Borland International, Inc.
GenBank is a trademark of the United States Department of Health and Human Services. GeneScan is a trademark of
Applied BioSystems, Inc. MegaBace is a trademark of GE Healthcare Biosciences. Netkey is a trademark of Netkey, Inc.
Oracle, Oracle SQL Server, and Oracle 9i are trademarks of Oracle Corporation. SWISS-PROT is a trademark of Institut
Suisse de Bioinformatique (SIB).
All other product names or trademarks are the property of their respective owners.
InfoQuestFP includes a library for XML input and output from Apache Software Foundation (http://www.apache.org).
The BLAST sequence search tool is based on the NCBI toolkit version 2.2.10 (http://www.ncbi.nlm.nih.gov/BLAST/).
2006. Bio-Rad Laboratories, Inc. All rights reserved.
Table of Contents
Support by Bio-Rad Laboratories, Inc. . . . . . . . . . . .2
Limitations on Use . . . . . . . . . . . . . . . . . . . . . . . . . . .2
7. Setting Up Experiments . . . . . . . . . . . . . . . . . . 33
6. Database Functions . . . . . . . . . . . . . . . . . . . . . . 27
5
19. Multiple Alignment and Cluster Analysis
of Sequences . . . . . . . . . . . . . . . . . . . . . . . . . 153
19.1 Calculating a Cluster Analysis Based on
Pairwise Alignment (Steps 1 and 2). . . . . . . .154
19.2 Calculating a Multiple Alignment
(Steps 3 and 4) . . . . . . . . . . . . . . . . . . . . . . . . . .155
19.3 Multiple Alignment Display Options . . . . . .156
19.4 Editing a Multiple Alignment . . . . . . . . . . . .157
19.5 Drag-and-Drop Manual Alignment. . . . . . . .157
19.6 Inserting and Deleting Gaps . . . . . . . . . . . . . .158
19.7 Removing Common Gaps in a Multiple
Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . .159
19.8 Changing Sequences in a Multiple
Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . .159
19.9 Finding a Subsequence . . . . . . . . . . . . . . . . . .159
19.10 Calculating a Clustering Based on the
Multiple Alignment (Steps 5 and 6). . . . . . . .160
19.11 Adding Entries to and Deleting Entries
From an Existing Global Alignment . . . . . . .161
19.12 Automatically Realigning Selected
Sequences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .161
19.13 Sequence Display and Analysis
Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .161
19.14 Exporting a Multiple Alignment . . . . . . . . .162
19.15 Converting Sequences Data to
Categorical Sets. . . . . . . . . . . . . . . . . . . . . . . . .163
19.16 Excluding Regions From the Sequence
Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . .164
19.17 Writing Comments in the Alignment . . . . .166
Appearance . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
26.5 Bar Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Fingerprints . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
Libraries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
About InfoQuestFP
1.2
1.3
Sequence types
0.841
1.428
1.203
1.753
1.522
Character types
Electrophoresis types
Linkage of experiments
to database entry
Generation of unlimited
relational databases
Organism X
Organism Y
Organism Z
Organism R
Organism S
Organism X
Organism Y
Organism Z
Organism R
Organism S
4
Org
anis
mX
Org
anis
mY
Org
anis
m
Org
Z
anis
mR
Org
anis
mS
Org
anis
mX
Org
anis
mY
Org
anis
mZ
Org
anis
mR
Org
anis
mS
Organ
ism X
Organ
ism Y
Organ
ism Z
Organ
ism R
Organ
ism S
X
nism
Orga
Y
nism
Orga
Z
nism
Orga
R
m
is
n
Orga
S
nism
Orga
X
nism
Orga
Y
nism
Orga
Z
nism
Orga
mR
is
n
Orga
S
m
nis
Orga
Organ
ism X
Organ
ism Y
Organ
ism Z
Organ
ism R
Organ
ism S
6
Generation of libraries for identification
Fig. 1-1. Flow chart of main steps in the acquisition and analysis of data in InfoQuestFP software.
1.4
9
Windows NT (Windows 2000, Windows XP)
Operating System
InfoQuestFP users are associated with Windows NT login
users. Each Windows NT user can specify his/her
InfoQuestFP directory for databases, and InfoQuestFP
saves this information in the users system registry. For
example, suppose that user X logs in on a Windows NT
machine with InfoQuestFP installed. This user can create
a directory, and specify this directory as the Home
directory in the Startup program. InfoQuestFP will save
this information in this users system registry, so that
each time the user logs in, InfoQuestFP will automatically
consider the same directory as the home directory. In this
way, each Windows NT user can define his/her own
InfoQuestFP home directory, without interfering with
other users. Within this home directory, the user can
specify as many databases as desired. InfoQuestFP allows
two types of databases to be created: a local file-based
database using a dedicated database mechanism, and a
database management system (DBMS)-based type, which
relies on an external open database connectivity (ODBC)compatible relational database management engine. In
the first type, protection of InfoQuestFP databases
depends on the protection of the specified directory by
the Windows NT user. If a user protects the directory
containing InfoQuestFP databases, other users will not be
able to change or to read the databases, depending on
whether the directory is write or read protected. In the
second type, protection also relies on the protection and
security measures provided by the DBMS.
Fig. 1-2 is an example of the database structure of one
Windows NT user. The figure illustrates that one user can
have many databases, and within each database, can
define various experiment types. If it is the intention to
compare database entries across different databases, the
experiment types that the entries of different databases
share should have the same name and definition.
Within the specified home directory, the InfoQuestFP
Startup program automatically creates the necessary
subdirectories for each database. The directory structure
shown in Fig. 1-3 corresponds to the database structure of
user X discussed above. The files shown are the
configuration files for the experiment types defined
under this user. If you want to create exactly the same
experiment types under another database or another user,
you can copy these .CNF files to the corresponding
directories.
Multidatabase Setup
1.5
10
Windows NT USER X
Database X1
Fingerprint type X1A, Fingerprint type X1B
Character type X1A, Character type X1B
Sequence type I1A, Sequence type I1B
Database X2
Fingerprint type X2A
Character type X2A, Character typeX2B
Database X3
Fingerprint type X3A, Fingerprint type X3B
Character type X3A
Matrix type X3A
Fig. 1-2. Structure of databases and experiment types within one user.
The line after the tag [DIR] indicates the full path where
the database is located. The line after [BACKCOL]
contains the RGB values for the window background
color, and the line after [SAVELOGFILE] indicates
whether log files are saved or not.
11
attempts are made to open or edit such a removed
database, InfoQuestFP will produce an error. The only
remedy is to delete the *Database*.DBS file.
12
13
Conventions
2.3
Floating Menus
2.2
Fig. 2-3. Floating menu that opens after rightclicking on a database entry.
The floating menus make the use of InfoQuestFP easier
and more intuitive for beginners, and much faster for
experienced users. In describing menu commands in this
guide, we will not usually mention the corresponding
floating menu command. You should try right-clicking in
all window panels in order to find out which is more
convenient in every specific case: calling the command
from the windows menu or toolbar button or from the
place-specific floating menu.
14
15
3.2
Example Database
Fingerprint Types
RFLP: Two different restriction fragment length
polymorphism (RFLP) techniques, called RFLP1 and
RFLP2, resulting in two patterns for each bacterial strain.
16
Character Types
FAME: Fatty acid methyl ester (FAME) profiles obtained
on a Hewlett Packard 5890A gas-liquid chromatography
instrument. This is a typical example of an open data set:
the number of fatty acids found depends on the group of
entries analyzed. If more entries are added, more fatty
acids will probably be found. Furthermore, FAME
profiles are an example of a continuous character type
the percentage occurrence of a fatty acid in a bacterium
can have any real value between zero and 100%.
PhenoTest: This is a fictitious phenotypic test assay that
reveals the metabolic activity or enzyme activities of
bacteria on 19 different compounds. The first cup of the
test is a blank control. This is an example of a closed data
set: the 20 characters are well-defined, and regardless of
the number of entries examined, the number of characters
in the experiment will always remain 20. Real examples
of such types of assays exist as commercial test panels
available on microplates or galleries. They can be
interpreted in two ways. One can read the reactions by
eye and score them as positive or negative; in this case the
character type is binary. If the microplates are read
automatically using a microplate reader, the reactions in
Sequence Types
16S rDNA: For all of the strains, and a number of
additional strains, the nearly complete 16S ribosomal
RNA gene has been sequenced. The sequences are
approximately 1,500 bases long, but not all of them are
sequenced completely.
Matrix Types
A partial homology matrix based upon hybridization of
total genomic DNA has been generated for the genera.
17
Introduction
TCP/IP
4.2
Setup
18
Initial Settings
On each computer, including the server computer,
InfoQuestFP has created a settings file NETKEY.INI,
which needs to be completed for the network. Run the
InfoQuestFP Startup program on the server, and click the
small network settings button
19
Configuring a Client
On each client computer, configure the file NETKEY.INI
in the same way as described above.
Defining a Client
On the server computer, add the client computer to the
list of InfoQuestFP clients as follows: Click <Add>. Enter
the DNS host name of the client computer in the dialog
box. In non-DHCP configurations (i.e., for permanent IP
addresses), also enter the IP address. Press <OK>. The
client is now shown in the upper panel, with its name
only (DHCP) or with its name and IP address
(permanent). From this point on, the client has access to
InfoQuestFP network software.
Note: If you do not wish to define specific
computers to have permission to obtain a license for
InfoQuestFP, you can enter an asterisk (*) in place of
the computer name, without specifying an IP
address. When doing so, every computer in the
LAN will be able to obtain an InfoQuestFP license.
Running InfoQuestFP
On the client computer, start InfoQuestFP and press the
<Analyze> button. The program should load if the
network is configured correctly and if the server name,
the IP addresses, and domain host names are filled in
correctly.
On the server computer, the client that uses InfoQuestFP
is now listed in the lower panel, showing its IP address,
DNS host name, total usage time (elapsed), and idle time
(Fig. 4-2).
More client computers can be added to the network by
simply adding the IP address and the computer name as
described in the previous paragraph.
4.3
License Granting
Each computer in the network can be granted or refused
access to the application software by the server program.
To refuse access to a particular computer, select it in the
upper panel, and refuse its access with <Change access>.
The blue screen icon changes into a red screen. To grant
the access again, click <Change access> a second time. To
permanently remove a computer from the users list, select
the computer in the upper panel, and click <Delete>.
Disconnect Users
The server can disconnect a client if needed. Select a user
in the lower panel, and disconnect it (withdraw its
license) with <Disconnect>.
Time-Out
The idle time for each user is recorded by the NetKey
server program. A time-out for inactive licenses can be
specified: if there is a waiting list, a client for whom the
idle time exceeds the time-out value will lose his license
in favor of the first in the waiting list. Specify a maximum
idle time with <Settings>, and enter the number of
minutes of allowed idle time.
Note: A user who has exceeded the idle time limit
will not be disconnected by the server as long as
there is no waiting list.
20
Messaging
The license server can send messages to any or all
connected clients, for example if the server computer will
be shut down or if a client will be disconnected. Send a
message to one user by selecting the user in the lower
panel, and <Send message>. Enter a message string and
press <OK>. The user will receive the message in a dialog
box. Send a message to all users with <Send message to
all>. Enter a message string and press <OK>. All active
users will receive the message in a dialog box.
Usage Statistics
The NetKey server program records every usage of each
client. Graphical statistics can be displayed about the
history of the usage over longer periods, and the relative
usage of each client computer can be shown for any time
interval. To view the usage history of the InfoQuestFP
network version, click <Statistics>. The panel shows a
detailed view of the number of computers that have used
the software on a time scale divided in hours. You can
scroll in this panel to view back in the past. The license
limit is shown as a red line; computers in a waiting list are
4.4
Waiting Lists
If the maximum number of licenses has been exceeded,
the server program manages a waiting list. The client
receives a message with its number in the waiting queue,
and the InfoQuestFP software opens as soon as the
clients license becomes available.
The user can request an overview of the computers
currently using an InfoQuestFP license by clicking the
network settings button
in the InfoQuestFP
21
5.2
Creating a Database
1
6
2
3
4
22
5.4
5.3
Database Settings
5.5
Log Files
23
5.5.3 In the data file windows, experiment file
windows, the Main window, or the Library window, select
File > View log file or
5.6
24
25
26
27
6. Database Functions
6.1
in the
toolbar.
A dialog box appears, asking for the number of new
entries to create, and the database where they should be
created. When connected databases are associated with
the database (see chapter 31), you can add the new entries
either in the local database or in the connected database.
6.2
28
Menu
Toolbar
Database
field names
Database
panel
Status bar
Experiment
presence panel
Experiment
types panel
Experiment files
panel
Comparisons
panel
Libraries
panel
6.3
29
6.4
button to the
6.4.8 Press the Enter key or <OK> to close the Entry edit
window and store the information, or press the Escape
key or <Cancel> to close the window without changing
any information.
To quickly enter the same information for many entries,
use the keyboard: use the arrow up and arrow down keys
to move through the entries in the database, use the
ENTER key to edit an entry, use the F7 and F8 keys to
copy and paste information, and use the Enter key again
to close the Entry edit window.
6.5
Fig. 6-2. The Entry edit window.
6.4.2 Enter information in each of the fields.
6.4.3 If a number of entries have mostly the same fields,
you can copy the complete entry information to the
clipboard using the F7 key or
30
, and
delete an attachment
. The same commands are
available from the menu as Attachment > Add new,
Attachment > Open, and Attachment > Delete,
respectively.
6.5.2 Press
31
6.6.3 Click on one of the database info fields in the
database panel header (see Fig. 6-1).
6.6.4 Select Edit > Set database field length.
6.6.5 Enter a number between 0 and 80, and press <OK>.
6.6
32
33
7. Setting Up Experiments
In InfoQuestFP, experiments are divided in six classes:
fingerprint types, character types, sequence types, trend
data types, and matrix types.
Fingerprint types: Any densitometric record seen as a
profile of peaks or bands. Examples include
electrophoresis patterns, gas chromatography or
HPLC profiles, spectrophotometric curves, etc. For
example, within the fingerprint types, you can create a
pulsed field gel electrophoresis (PFGE) experiment
type with specific settings such as reference marker,
molecular weight (MW) regression, stain, band
matching tolerance, similarity coefficient, clustering
method, etc. Fingerprint types can be derived from
TIFF or bitmap files as well, which are twodimensional bitmaps. The condition is that you must
be able to translate the patterns into densitometric
curves. This functionality is part of a standard module
that comes with the InfoQuestFP software.
Character types: Used to define any array of named
characters, binary or continuous, with fixed or
undefined length. The main difference between
character types and electrophoresis types is that in the
character types, each character has a well-determined
name, whereas in the electrophoresis types, the bands,
peaks or densitometric values are unnamed (a
molecular size is NOT a well-determined name).
Examples of character types include antibiotics
resistance profiles, fatty acid profiles (if the fatty acids
are known), metabolic assimilation or enzyme activity
test panels such as API, Biolog, and Vitek, etc. Single
characters such as Gram stain, length, etc. also fall
within this category. This functionality comes with the
optional character types module for InfoQuestFP.
Sequence types: Enter sequences of nucleic acids
(DNA and RNA) and amino acids. InfoQuestFP
34
35
and
button, or
36
) or
37
8.1.14 The image can be reset to its original state by Image
> Load from original or pressing
90 left), 90 right (
right), or 180 (
or Image >
, Image >
or Image >
or Image >
or
channel).
8.1.12 The editor also allows you to crop the image to a
selected area, to which the following functions are
available:
Crop > Add new crop or
button
8.2
Processing Gels
38
Fig. 8-2. The Fingerprint data editor window. Step 1: Defining pattern strips.
Horizontal mirror of TIFF image. Using the same
command twice restores the original TIFF file.
The whole process of lane finding, normalization, band
finding, and band quantification is contained in a wizard,
allowing you to move back and forth through the process
and make changes easily in any step of the process. The
and
8.3
and
respectively.
8.3.2 When a large image is loaded, a Navigator window
can be opened to focus on a region of the image. To call
the navigator, double-click on the image, or right-click
and select Navigator from the floating menu.
8.3.3 You can change the brightness and contrast of the
image with Edit > Change brightness & contrast or with
. This opens the Image brightness & contrast dialog
box (Fig. 8-3).
8.3.4 In the Image brightness & contrast dialog box, click
Dynamical preview to have the image directly updated
with changes you make.
8.3.5 Use the Minimum value slide bar to reduce
background if the background of the whole image is too
high.
39
The Rainbow palette checkbox can be used to reveal even
more visual information in areas of poor contrast (weak
and oversaturated areas) by using a palette composed of
multiple color transitions.
8.3.7 If you press <OK>, the changes made to the image
appearence are saved along with the fingerprint type.
Note: The brightness and contrast settings are saved
along with the fingerprint type, but are not specific
for a particular gel. The Tone curve editor is a more
powerful image enhancement tool for which the
settings are saved for each particular gel.
8.3.8 With File > Show 3D view or
, a three-
8.3.9 In the 3D view window, you can use the left, right,
up, and down arrow keys on the keyboard to turn the
position of the image in all directions. The image can also
be rotated horizontally and vertically by dragging the
image left/right or up/down using the mouse.
8.3.10 You can change the zoom factor using View >
Zoom in (PgDn) or View > Zoom out (PgUp).
40
to let the
41
will always be shown with background subtracted and
with spots removed. In addition, when two-dimensional
quantification is done, the gelstrips with background
subtracted and spots removed are used. Hence, we
recommend NOT using these options unless (1) the image
has a strong irregular background, for example by nonhomogeneous illumination of the gel, so that the gelstrips
would not look appropriate for presentation or
publication; (2) the gel contains numerous spots that
would influence the densitometric curves extracted from
the gelstrips (spots on the image are seen as peaks on a
densitometric curve, and hence have a strong impact on
correlation coefficients, band searching, etc.).
Background subtraction is based on the rolling ball
principle, where the size of the ball in pixels of the image
can be entered. The larger the size of the ball, the less
background will be subtracted.
Spot removal is a mechanism similar to the rolling ball,
except that an ellipse is used to separate bands from
spots. The size of the ellipse can be entered in pixels.
Unlike the background subtraction, the size of the ellipse
should be kept as small as possible to avoid erasing
bands.
42
8.3.26 Add lanes with Lanes > Add new lane or the
ENTER key or
if necessary.
and
), respectively.
bit TIFF file, the tone curve settings are applied to the full
16-bit (65,000) grayscale information, which allows much
more information to be magnified in particular areas of
darkness. The advantages are:
Weak bands are much better enhanced resulting in a
smoother and more reliable picture.
The tone curve acts at a level below the brightness and
contrast settings and can be saved along with a
particular gel. In all further imaging tools of the
program, the tone curve for the particular gel is
applied. Brightness and contrast settings are not
specific to a particular gel.
You can fine-tune the tone curve to obtain optimal
results. This will be explained below.
8.3.30 First select the brightness and contrast box with
Edit > Change brightness & contrast or with
, and
densitometric curves.
43
8.4
In this step, the window is divided in two panels (Fig. 89): the left panel shows the strips extracted from the
image file and the right panel shows the densitometric
curve of the selected pattern, extracted from the image
file.
44
45
When setting up a new database, the normalization
process of the first gel involves the following steps. The
underlined steps are the ones that will be followed for all
subsequent gels.
Marking the reference patterns (reference patterns are
identical samples loaded at different positions on the
gel for normalization purposes)
Showing the gel in normalized view
Identifying a suitable reference pattern on which we
will define bands as reference positions. Reference
positions are bands that will be used to align the
corresponding bands on all reference patterns from the
same and from other gels
Defining the reference positions
of the patterns.
8.5
Normalizing a Gel
Repeat this action for all other reference lanes (lanes 9 and
18 on the example).
8.5.2 Select Normalization > Show normalized view or
.
8.5.3 Choose the most suitable reference pattern to serve
as standard: lane 9.
8.5.4 Select a suitable band on the destined standard
pattern and References > Add external reference position.
You are prompted to enter a name for the band. You can
enter any name, or if possible, the molecular weight of the
band. In the latter case, the program will be able to
determine the molecular weight regression from the sizes
entered at this stage.
8.5.5 Use the following scheme to enter all reference
positions on the example gel (Fig. 8-15).
Within a fingerprint type, the set of reference positions as
defined, and their names, together form a reference system.
Once a gel is normalized using the defined reference
positions and has been saved, the reference system is
saved as well. As soon as you change anything in the
reference system, including a position or a name, a new
46
47
button again.
48
8.6
"Gray zone" = 5%
Min. profiling = 5%
10%
5%
0%
49
max. value of lane. Specify a Shoulder sensitivity only if
you want to allow the program to find band doublets and
bands on shoulders (sensitivity of 5 should be fine for
most gels).
8.6.3 Press <OK> to accept the settings.
8.6.4 Select Bands > Auto search bands or
to find
50
8.7
51
This opens the Entry search dialog box (see chapter 14.3
for detailed explanation on search and select functions).
8.7.3 In the Entry search dialog box, check RFLP1 and
press <Search>. All entries having a pattern of RFLP
associated are now selected in the database, which is
visible as a blue arrow to the left of the entry fields (see
chapter 14.3).
8.7.4 Under Fingerprint types (Experiment type panel),
double-click on <RFLP1> to open the Fingerprint type
window.
52
8.8
Quantification of Bands
The file lists all the bands defined for each pattern with
their normalized relative positions, the metrics (e.g.,
molecular weight), the height, and relative onedimensional surface, as calculated by Gaussian fit.
repeatedly.
Fig. 8-21. The Peak intensity profile window with peak intensity regression curve.
If you have added a band later, you can search the surface
of that band alone with Quantification > Search surface
of band.
When the contours are found, the program shows for
each selected band its volume in the status bar: the sum of
the densitometric values within the contour.
8.8.8 To change the contour of a band manually, first
select the band and zoom in closely (8.8.1 and 8.8.3).
8.8.9 Hold the CTRL key and drag the mouse (holding
left button) to correct the upper and lower contours.
8.8.10 For known reference bands, you can enter a
concentration value by selecting the band and
Quantification > Assign value (or floating menu by rightclicking, or double-clicking). Known reference bands are
marked with .
8.8.11 Once multiple reference bands are assigned their
concentrations, a regression to determine each unknown
band concentration is calculated by selecting
Quantification > Calculate concentrations.
The Band quantification window (Fig. 8-23) shows the real
concentration as a function of the band volumes, using
cubic spline regression functions.
53
to store
8.9
54
. Further settings
55
button to the
56
instead of gray
in the
toolbar.
8.10.2 Enter the number of entries you want to create,
such as 1, and press <OK>.
The database now lists one or more entries with a unique
key automatically assigned by the software.
8.10.3 Select the gel file in the experiment files panel (Fig.
6-1) and choose File > Open experiment file (entries) from
the main menu.
This opens the Fingerprint entry file window, listing the
lanes defined for the example gel (Fig. 8-25).
57
setup, however, where different fingerprint types are
defined, it is not possible to have unique information
fields for each fingerprint lane separately. The same holds
true for commenting on a fingerprint file as a whole, such
as a gel file or a batch of electrophoresis runs.
In a connected database setup, it is possible to assign
information fields to the fingerprint lanes in particular
and to a fingerprint file as a whole.
8.11.1 To define one or more fingerprint lane information
fields, open the Fingerprint type window, and select
Settings > Fingerprint information fields.
The Fingerprint information fields dialog box (Fig. 8-28)
allows information fields to be defined for fingerprint
lanes. Note that the defined fields are common to all
fingerprint types in the database. The strings filled in,
however, are unique per lane and per fingerprint type. As
an example, if you define a fingerprint lane field as
Comment, all fingerprint types will have a fingerprint
field called Comment. The information you fill in will
be unique per fingerprint type and lane.
58
. This shows
59
8.12.21 Once the data gel is linked, you can close the
Fingerprint data editor window of the reference gel.
The tracking info, curve settings, and alignment of the
reference gel are now automatically superimposed to the
data gel. You can run through the different steps till you
reach the normalization step: the alignments as obtained
in the reference gel are shown. If you wish, you can show
the normalized view before you move to the last step, i.e.,
defining bands.
8.12.22 Whenever needed, you can open a reference gel to
which a data gel is linked by clicking the
(File > Open reference gel).
button
60
8.12.31 Once the data gel is linked, you can close the
Fingerprint data editor window of the reference gel.
button
Sample &
band no.
Running
time
17B,1
17B,2
17B,3
17B,4
17B,5
17B,6
17B,7
17B,8
17B,9
33.00
34.60
43.30
52.90
88.20
89.00
155.40
158.50
165.10
Size in bp
Height
Volume
60.47
67.53
106.02
146.14
298.95
302.68
709.46
736.00
796.02
228
201
113
381
131
1425
304
182
121
929
815
855
1908
690
7821
1800
966
713
61
330
346
433
529
882
890
1554
1585
1651
90.9
86.7
69.3
56.7
34.0
33.7
19.3
18.9
18.2
62
8.13.7 Select Settings > Edit reference system or doubleclick to define the molecular weight regression.
8.13.8 In the Reference system window, copy the entered
molecular weights with Metrics > Copy markers from
reference system.
InfoQuestFP is now configured to import the Genescan
tables.
8.13.9 Exit the Reference system window and the
Fingerprint type window.
To import ABI Genescan files, there are scripts available
by contacting Bio-Rads technical support. These scripts
can be launched from the InfoQuestFP Main window,
using the menu option Scripts > Browse Internet, or
. The script to import Genescan data can be found
under Import tools and is called Import ABI Genescan
tables. A description of how to use this script is also
available by contacting Bio-Rads technical support.
8.13.10 When running the script, you can use the example
Genescan file in the Examples folder (on the CD-ROM or
in the InfoQuestFP folder):
Examples\TXTfiles\Genescan.txt.
Option 2: Importing Band Sizes by Using a Synthetic
Regression Curve
As an exercise, we will now import the same file using the
second option described above, i.e., allowing the program
to create its own regression curve.
8.13.11 In the Main window, open the Fingerprint Type
window for ABI-Genescan.
8.13.12 In the ABI-Genescan Fingerprint type window,
select Settings > New reference system (curve).
8.13.14 Press the <Add> button to add the sizes for all
reference bands available in the fingerprint type (see lane
17B, Fig. 8-30).
63
not be automatically compatible with the original, and
compatibility can only be obtained by creating a
molecular weight regression curve for both reference
systems. Both reference systems can then be remapped
onto each other, which inevitably causes some loss in
accuracy. The degree of compatibility depends on the
number of reference positions in both systems, the
amount of overlap between regression curves, the
predictability of the regression curve using one of the
available methods, the spread of calibration points
(reference positions), the definition of the reference
bands, etc.
15.3
15.3 (7%)
(7%)
15.3
15.3 (9%)
(9%)
11.5
11.5 (18%)
(18%)
10.8
10.8 (22%)
(22%)
15.3
15.3 (9%)
(9%)
11.5
11.5 (18%)
(18%)
9.6
9.6 (57%)
(57%)
15.3
15.3 (7%)
(7%)
11.5
11.5 (18%)
(18%)
9.6
9.6 (65%)
(65%)
8.5
8.5 (85%)
(85%)
8.5
8.5 (95%)
(95%)
9.6
9.6 (65%)
(65%)
8.5
8.5 (85%)
(85%)
8.5
8.5 (95%)
(95%)
64
65
type.
9.1.2 The New character type wizard prompts you to
enter a name for the new type. Enter a name, such as
Pheno.
9.1.3 Press <Next> and check the type of the character
data files. Check Numerical values if the tests are not just
positive or negative but can also differ in intensity
(choose Numerical values in this example).
9.1.4 For numerical values, enter the number of decimal
digits you want to use. If you only want to use integer
values, for example between 0 and 10, enter 0 (zero, as in
this example).
9.1.5 After pressing <Next> again, the wizard asks if
the character type has an open or closed character set.
In an open character set, the number of characters is not
defined. For example, studying 10 bacterial strains by
means of fatty acids can result in a total of 20 fatty acids
found, but if more strains are added, more fatty acids
may become present in the list. In such cases, Consider
absent values as zero should be checked, because if a
fatty acid is not found in a strain, it will not be listed in its
fatty acid profile, and thus should be considered as 0
(zero).
In a closed character set, the same number of characters
are present for all entries studied. This is the case with
commercially available test kits. In such cases, Consider
absent values as zero should not usually be checked.
9.1.6 Answer <No> to the open character set and leave
the absent values checkbox unchecked.
If the character set is closed, i.e., when all the tests are
predefined, you can specify the Layout of the test panel.
This layout involves a Number of rows and Number of
columns to be specified, as well as the Maximum value
for all the tests. By default, the number of rows and
columns is set to 0 (zero), which means that the character
set will be empty initially. In this case, you still can add all
the tests one by one or by columns and rows, once the
character type is defined. If you are defining a test panel
based upon a microplate system (96 wells), you can now
66
9.2
conversion settings or
67
9.3.4 Select Data01 in the Files panel, and File > Open
experiment file (data).
This opens the Character data file window (Fig. 9-2), which
is empty initially. The 10 test names that were entered as
an example, are shown in the column header. You can
add characters here with Characters > Add new character
or
9.3
. The
Before you can enter data, you must add new entries to
the file. Suppose that we want to add character data for
all entries of the database except the standard (17).
9.3.5
) or the
68
instead of gray
9.4
69
Entry 1
Character 6
Character 5
Character 4
Character 3
Character 2
Character 1
Layer 3
Layer 2
Layer 1 (active)
Entry 2
Entry 3
Entry 4
9.5
70
Settings or
The Image tab offers two choices for the Image type:
Densitometric and Color scale.
71
72
the cursor
. Similarly, it is possible to
. This
Fig. 9-7. Color scale editor in the Cell layout step of the BNIMAGE program.
73
With the Max. value field you can enter the maximum
value to which all characters will be rescaled.
9.5.30 Enter 100 as Max. value and press <OK> to
confirm the color settings.
9.5.31 Move to the next step using Edit > Next step or the
button.
. The mouse
Quantification tab.
Cell integration methods include Average, Median, and
Sum. If the image contains spots that could influence the
quantified values, the Median option will provide more
reliable results than the arithmetic averages.
9.5.36 Select Median integration and press <OK>.
To illustrate the calibration feature, we will define one of
the cells as negative control (minimum value) and
another cell as positive control (maximum value).
9.5.37 Select cell A1 (negative control) and
Quantification > Define calibration point. Enter 0 as the
value and press <OK>.
9.5.38 Select cell A12 (positive control) and
Quantification > Define calibration point. Enter 100 as
the value and press <OK>.
Since only two calibration points are defined now, it is
obvious that the program needs to calculate a linear
regression through the defined points, in order to
requantify the other cells according to the negative and
positive controls.
74
Quantification tab.
9.5.40 Under Calibration, enter 1 as Polynomial degree.
This will result in a first degree regression.
9.5.41 Press <OK> to close the Settings dialog box.
9.5.42 Select Quantification > View calibration curve.
This shows a linear regression between the two
calibration points, zero and 100.
Finally, there is one more thing to do: copy the character
values in the microplate opened in InfoQuestFP.
75
The Image tab offers two choices for the Image type:
Densitometric and Color scale.
Unlike the first microplate image, the color reaction of
this gene array can be interpreted as a simple change in
intensity, such as from light to dark; hence, you should
select Densitometric.
9.5.60 Select Densitometric under Image type.
The Densitometric values panel offers some additional
tools to edit the TIFF file: Inverted values for inverting the
densitometric values; and Background subtraction for
two-dimensional subtraction of the background from the
TIFF file, using the rolling ball principle. The Ball size can
be entered in pixels. Background subtraction is only
necessary if the illumination of the image is not uniform,
which is not the case in the example image. Spot removal
allows all spots and irregularities below a certain size to
be removed from the image, whereas larger structures are
preserved.
9.5.61 Leave Background subtraction disabled, and
enable Spot removal, with a maximum Spot size of 3
pixels.
9.5.62 Press <OK> to quit the settings panel.
Fig. 9-11. The BNIMA program with a gene array image (fragment) loaded.
76
77
9.5.85 Select all but the three last rows and
Quantification > Add cells to character set.
The cells to be used in the character set are now
numbered 1196.
9.5.86 Copy the quantified cells to the clipboard with
Quantification > Export to clipboard or
9.5.78 If this is the case, move to the next step using Edit
button.
9.5.81 Select the fourth cell in the second last row and
Quantification > Define calibration point.
9.5.82 Enter 0 (zero).
9.5.83 Select the fifth cell in the second last row and
Quantification > Define calibration point.
9.5.84 Enter 100.
All cells are now quantified between 0 and 100%
hybridization control, and we now need to specify which
cells to add to the character set. Since the calibration cells
(second last row) are not part of the character set, these
should not be included.
78
79
type.
80
) or
the F2 shortcut.
10.1.22 File > Exit when you are finished editing the
sequences.
10.1.23 In the Main window, double-click on the file
Seq01 (or File > Open experiment file (entries).
The Sequence entry file window (cf. the Fingerprint entry file
window, Fig. 8-25) contains unlinked entries, which you
can now link to the corresponding database entries.
A link arrow
for each entry allows you to link an
entry to a database entry by clicking on the arrow and
dragging it onto a database entry, and then releasing the
mouse button. When the experiment is linked, its link
arrow is purple:
81
.
10.1.25 Release the mouse button on the database entry;
the entry is now linked to this database entry, and its
arrow in the Sequence entry file window has become
purple
instead of gray
82
to
to the
10.2.11 To zoom out on the curve view, use View > Zoom
out (trace) or
panel.
button
83
1. Trimming the sequences, i.e., physically removing the
unusable ends. This level of cleaning is based upon the
percentage of unresolved positions at both ends of the
sequence. Trimmed ends are neither used, nor shown
in the Assembly view of the Assembler main window.
2. Inactivating doubtful parts of the sequence. This level
of cleaning is based both on the quality of the
densitometric curves and the proportion of unresolved
positions. Inactivated parts are still shown, but do not
actively contribute to obtain the consensus. However,
they are aligned to the consensus. If there is no
consensus base at a position, the inactivated regions
will not be considered by the program. The user can
still compare the consensus position with the base in an
inactivated sequence region. Inactive regions can still
be set as active at anytime, whereas active regions can
be set as inactive as well. If an inactivated region is the
only information available in a part of the consensus
sequence, it will be used to fill in the consensus
sequence. If a position on an inactivated region conflicts
with other sequences, it will be ignored.
10.2.17 Cleaning of the sequences happens automatically
and is based on the quality assignment settings. The quality
of the sequence is shown on the graphical overview in the
Trimming view (Fig. 10-3). A color scale ranges from green
(acceptable quality) over yellow and orange to red
(unacceptable quality). The trimmed ends are indicated
by a black bar underlining the sequence. Inactivated
zones are indicated by a gray bar. Unresolved positions
(N) are indicated with a small flag on top of the
sequence.
10.2.18 The quality assignment can be changed by
modifying the settings in the Quality assignment dialog
box (Fig. 10-4). This dialog box can be opened with File >
Quality assignment or
84
keyboard or the
(lower panel).
keyboard. The
button to the left of the curve view
(lower panel) can also be used.
. This opens
85
clipboard is automatically pasted into the editor, which
you can still edit. An input field Name allows a name to
be entered for the vector.
10.2.32 Vectors can be deleted from the list using the
<Delete selected> button.
Vectors entered are automatically saved along with the
project.
The Remove vectors dialog box (Fig. 10-5) contains a
number of alignment parameters:
10.2.33 Minimum score: The minimum number of
matching bases the sequence and the vector should have
in order for the vector sequence to be removed. This
number is the result of the total number of matching
bases minus the total penalty resulting from mismatches
and gaps.
10.2.34 Unit penalty per gap: The penalty, as a factor of
the match score, assigned to a gap in either the sequence
or the vector after the alignment.
10.2.35 Unit penalty per mismatch: The penalty, as a
factor of the match score, for a single mismatch between
the vector and the sequence after the alignment
86
87
88
button.
89
90
. The
91
box opened from the Entry edit card. Such projects can be
changed at any time and are updated automatically in the
InfoQuestFP database.
Finding Subsequences
10.2.88 With Edit > Find or CTRL+f you can open a Find
sequence tool in Assembler (Fig. 10-13) to find
subsequences. You can fill in a subsequence including
unresolved positions according to the IUPAC code (e.g.,
N, R, Y, etc., including N).
10.2.89 Under Search in, you can choose between Current
sequence (the selected one), All sequences, and Consensus.
10.2.90 Using Mismatches allowed, it is possible to find
subsequences that differ in a defined number of bases
from the entered string.
10.2.93 With Search in both directions enabled, the invertcomplemented sequence will be searched as well.
10.2.94 Press <Search> to execute the search command.
The Result set displays all the instances that were found
(Fig. 10-13), indicating with arrows whether they were
found on the sequence as is, or after invertcomplementing. The positions are also indicated.
10.2.95 If you click on an item in the list under the Result
set, the matching subsequence is selected in the sequence
panel (central panel). The bottom panel of the Find
sequence window displays the alignment of the search
sequence and the target sequence, indicating mismatches
and gaps introduced (if allowed).
92
93
will automatically be selected in the Assembler editor.
Such projects can be changed at any time and are updated
automatically in the InfoQuestFP database.
10.2.104 From within an InfoQuestFP comparison, you
can double-click on a base of a sequence, which opens the
sequence experiment card with that base selected.
Pressing the
button in the sequence experiment card
in turn launches Assembler with the same base selected.
10.2.105 When finished, exit the window with File > Exit.
Fig. 10-16. The Matrix file window is used to enter and edit similarity values.
94
95
OD
MAX
Smax
MIN
T05
T50
T95
96
Sample1
1
2 3
2 3
2 3
T0
2 3
T1
T2
T3
OD
Max(B1)
B2
Max(B2)
T4
Smax(B1)
OD
B1
B3
A2
Smax(B3)
Max(B3)
Max(A3)
B1
B3
A2
Smax(A2)
Smax(B2)
B2
A1, A3
T0
T1
T2
T3
T4
4
A1,Smax A2,Smax A3,Smax B1,Smax B2,Smax B3,Smax
Sample 1
Sample 2
0
...
67
...
0
...
71
...
63
...
70
...
Sample1
Sample2
0.0
...
1.8
...
0.0
...
2.6
...
1.3
...
2.2
...
Fig. 11-2. Example of the processing of kinetic readings of a phenotypic test panel. (1) Readings are done at
different times T0...T4; (2) a curve model is fit through the values obtained for each well in the test panel (in
the example, logistic growth); (3) one or more specific parameters are derived from the curves (in the example,
the final value MAX and the maximum slope Smax); and (4) a data matrix is constructed from a curve parameter
obtained for each well, including all the samples analyzed. In the example, two data matrices are generated as
two parameters were chosen.
type.
11.2.2 The New trend data type wizard prompts you to
enter a name for the new data type. Enter a name and
press the <Finish> button to complete the setup of the
new trend data type. It is now listed under Trend data
types in the Experiment type panel.
97
chosen to be included in the analysis. Additionally, one or
more choices (Model choices) can be specified for each
model.
Below is a list of the models that are available and their
parameters and model choices:
Linear function
y = A + Bx
Available parameters are the Intercept A and the Slope
B. The function can be forced to pass through 0 (zero),
in which case the intercept A is always 0 (zero).
Logarithmic function
y = A + B log x
Similar to a linear function, the available parameters
are the Intercept A and the Slope B. The function can
be forced to pass through 0 (zero), in which case the
intercept A is always 0 (zero).
Exponential function
y = O + Ae
Bx
y = O + Ax
Fig. 11-4. Trend curve parameters dialog box for
selecting the models to use and the associated
parameters to include.
Fig. 11-3. The Trend data type window, with 6 trend curves defined.
98
Hyperbolic function
y = A+
B
xC
Gaussian function
y = O + Ae
xM 2
[1 + e
Q B (x M )
1
Q
y = A + Ce
Logistic growth
y = A+
B ( xM )
99
The trend data import script reads text files that contain a
table of tab-delimited strings and values. The table should
contain the trend data in the following format:
button.
Curve 1
Curve 2
...
X value 1
Y value
Y value
...
X value 2
Y value
Y value
...
...
...
...
...
The table can occur anywhere in the file; you can select it
in the script. Furthermore, certain rows and/or columns
can be selected or deselected from the table. However, the
first selected row (green) should be the header row
describing the curve (character) names, and the first
selected column (yellow) should be the X values. Each
next column should contain the Y values for the curve
named in the column header.
The script can create a new trend data experiment type if
required, and will automatically add all trend curves
found in the data files.
Six example data files (artificial data) are provided in the
InfoQuestFP program folder or on the installation CDROM and can be found under Examples\Trend data.
11.3.4 Launch the script after installation of the plug-in
tools, by selecting File > Import > Trend type data in the
InfoQuestFP main window.
Fig. 11-5. The Trend data type window for the example data set with three model parameters defined.
100
101
11.4.11 Select a number of entries in the database, for
which trend curves are present.
11.4.12 Open the Trend data type window (Fig. 11-3) and
select File > Create trend data window.
The resulting Trend data window (Fig. 11-9) displays the
curves for all the selected entries in a single plot.
curve view as depicted in Fig. 11-8 and the Info view, which
contains detailed information about:
The fit model chosen for visualization (see 11.4.2): the
standard deviation and the parameters derived from
the formulas (see 11.2) are indicated.
The curve parameters selected for comparison (see
11.2).
11.4.8 With the
102
103
12.1.2 Enter a name for the new type. Enter a name, such
as DNA-homol.
Press the <OK> button to complete the setup of the new
matrix type. It is now listed under Matrix types in the
Experiment types panel.
Unlike other experiments, a matrix type does not provide
an experiment for each entry. Instead, it contains
similarities between entries. Hence, the data file, which
contains the experiment data, and the entry file, which
links the experiments to database entries, are the same
here. There are two ways to enter similarity values: by
importing a matrix as a whole, and by entering the values
from the keyboard.
To import a matrix, it must have the following format:
ENTRY KEY<tab>VALUE<eol>
ENTRY KEY<tab>VALUE<tab>VALUE<eol>
ENTRYKEY<tab>VALUE<tab>VALUE<tab>VALUE
<eol>
104
Fig. 12-1. The Matrix file window allows you to enter and edit similarity values.
105
106
(1)
(2)
(3)
(4)
Fig. 13-2. Experiment cards of fingerprint type (1), character type with fixed number of characters (closed type)
(2), character type with an unfixed number of characters (open type) (3), and sequence type (4).
connected database (see chapter 31), an additional option,
Remove this experiment, makes it possible to delete the
character set from the database.
Warning: This is an irreversible operation.
13.2.5 You can move the cursor using the left and right
arrow keys.
Character Types
button of
Sequence Types
13.2.14 Press the
107
data, or File > Edit sequence data, you can edit the
experiment data for the entry directly in the file.
108
109
button. When
.
to open
110
Database Field
Use the <Database field> button to enter a (sub)string to
find in any database field (<Any field>) or in any specific
field that exists in the database (Fig. 14-4).
Note: The wildcards * and ? are not used in the
advanced query tool.
111
Experiment Presence
Use the <Experiment presence> button to specify an
experiment to be present in order for entries to be
selected.
Fingerprint Bands
The <Fingerprint bands> search component allows
specific combinations of bands to be found in the
database entries. The dialog box that opens (Fig. 14-6)
allows you to enter a Fingerprint experiment type, and
specify an Intensity filter, Target range, and Number of bands
present.
112
Character Value
Under Target range, you can search for bands with specific
sizes, either entered as Normalized run length (%) or as
Metric values. A target range should always be entered
with the lower value first. Note that when only one of the
two limits is entered, the program will consider all bands
above or below that limit, depending on which limit was
entered. For example, when only the first (lower) limit is
entered and the upper limit is left blank, all bands above
the specified size will be accepted. When both fields are
left blank, no size range will be looked for, i.e., all bands
will be considered.
Subsequence
Fingerprint Field
This option is only available in a connected database with
defined fingerprint lane information fields (see 8.11). It is
possible to search for a string in a particular field and
within a particular fingerprint type (Fig. 14-7).
Furthermore, Case sensitive can be specified, and the
string can be entered as a Regular expression (see 33.2).
113
Attachment
With the <Attachment> search component, you can
perform a search in attachments that are linked to
database entries (see 6.5). With the pull-down list you can
choose the type of attachments to search in. One of the
possibilities is All, i.e., to search within all attachment
types. For all types of attachments it is possible to search
in the Description field, and for text type attachments, it
is also possible to search within the Text. The Text option
does not apply to the other attachment types.
Logical Operators
NOT operates on one component. When a
component is combined with NOT, the condition of the
component will be inverted.
114
button to combine
Fig. 14-10. Combined query constructed in the Advanced query tool (see text for explanation).
115
14.5 Subsets
A selection of entries from the database can be saved as a
subset. Subsets can include a certain target group in a
database, for example, a single species in a database
containing many species, or any selection of relevant
strains for a certain purpose. Selecting the defined subset
displays a view of the database containing only the
entries of the subset. Search functions as well as copy and
select functions will be restricted only to the displayed
subset, and new comparisons, when created, will only
contain the selected entries from the subset.
14.5.1 In database DemoBase, make sure no entries are
selected using Edit > Clear selection list (F4 key) or
selection.
14.5.11 If you want to copy entries from one subset to
another subset without removing them from that first
subset, you can use the command Edit > Copy selection
or
.
Note: (1) A selection that is copied or cut from a
subset or copied from the database is placed on the
Windows clipboard as the keys of the selected
entries, separated by line breaks. You can paste
them in other software when desired.
116
117
Toolbar
Dendrogram
panel
Image panel
Matrix panel
; character experiments as
; sequence
experiment types as
, and matrix types as
. This
button shows the image of the experiment type.
14.7.3 Press the
and
118
hide the band positions in the image panel. You can show
band positions without showing the image.
14.7.10 When bands are shown on the image, they can be
exported as a tab-delimited file with File > Export bands.
The export file, opened as RESULT.TXT in Notepad,
contains the key of the entry, and a list of band positions
as relative run lengths (in percent) and molecular weight
(if determined).
14.7.11 Press
, the selected
, the same
119
14.8.3 You can open the existing comparison by doubleclicking on MyComp in the Comparisons panel.
Paste selection or
120
121
123
45
6 7
Pattern 1
Pattern 2
Pattern 3
1
+
+
+
2
+
+
3
+
-
4
+
-
5
+
+
6
+
+
+
7
+
+
8
+
+
+
122
(2)
15.1.4 Close the Composite data set window with File >
Exit. The new composite data set is shown in the
experiment types panel of the Main window.
123
zoom functions
and
124
button.
All bands that are closer to the new band class are
automatically reassigned to that new class. To reassign
bands to the other class, follow the procedure explained
in 15.3.2 to 15.3.3.
If bands are incorrectly assigned to different classes, you
can merge the classes as follows:
15.3.9 Choose a band that occurs in the middle of the two
classes.
125
15.3.21 A band matching report can be exported as a tabdelimited table using Bandmatching > Export
bandmatching (Fig. 15-11).
in the
Main window.
15.5.4 With Edit > Paste selection or
in the other
126
127
128
HEADER:
Band classes
3.48
0.00
0.00
0.00
2.36
0.00
0.00
0.00
0.00
2.99
8.85
0.00
0.00
2.42
0.00
0.00
0.00
3.76
8.95
10.46
15.51
13.71
6.39
3.92
10.34
4.97
2.59
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00 8.31
0.00 11.78
9.28 7.11
0.00 5.13
0.00 6.29
0.00 6.33
5.47 7.44
0.00 5.85
0.00 5.95
0.00 13.31
0.00 10.23
0.00 5.05
0.00 5.55
0.00 6.61
0.00 5.27
0.00 3.69
0.00 3.65
0.00 4.56
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
TABLE:
Rows=entries
Fig. 15-13. Numerical band matching character table exported from InfoQuestFP (tab-delimited).
button of RFLP1-table.
129
130
Fig. 15-14. Discriminative bands for selected entries: positive discrimination left, negative discrimination right.
131
Introduction
16.2
Calculating a Dendrogram
button
132
(1)
(2)
133
16.3
Fig. 16-5. Comparison window with dendrogram, image, entry names, and similarity matrix.
button.
134
, the selected
, the same
16.4
16.4.1 You can drag the separator lines between the four
panels to the left or to the right, in order to divide the
space among the panels optimally.
16.4.2 Similarly, you can drag the separator lines
between the information field columns to the left or to the
right, in order to divide the space among the information
fields optimally.
Note that in this way, you can add new database entries
to an existing dendrogram: select the new entries in the
database, open an existing comparison with dendrogram,
and paste the selection into the comparison. Both the
similarity matrix and the dendrogram will be updated,
which uses considerably less time than recalculating the
whole cluster analysis. The entries first need to be copied
to the clipboard from the Main window or from another
comparison.
16.5.6 To copy entries to the clipboard, use the Edit >
Copy selection command or
16.4.4 You can also move the cursor with the arrow keys.
16.5
in the other
comparison.
16.5.8 To save the comparison with the dendrogram,
16.6
135
16.7
If you scroll through the tree, you will notice that two
entries, i.e., Perdrix sp. strain numbers 53175 and 25693
protrude on a very long branch. These two entries are
ideally suited as an outgroup.
16.7.3 Hold the CTRL key and click on the node that
connects all the entries belonging to the Vercingetorix
cluster. The entries of this cluster are now selected.
16.6.8 Select Clustering > Reroot tree, and the new root
connects the outgroup with the rest of the entries.
16.6.9 The software automatically limits the similarity
range to the depth of the dendrogram. If you want to
change this range, select Clustering > Set minimum
similarity value.
16.6.10 The similarity scale can be displayed in similarity
(default for most clustering types) or in distance. To
toggle between similarity and distance modes, select
Layout > Show distances.
136
16.8
137
is usually calculated for a whole dendrogram to create an
estimate of the faithfulness of a cluster analysis. In
InfoQuestFP, the value is calculated for each cluster
(branch), thus estimating the faithfulness of each
subcluster of the dendrogram. Obviously, you can obtain
the cophenetic correlation for the whole dendrogram by
looking at the cophenetic correlation at the root.
16.8.3 Select Clustering > Calculate error flags again to
remove the error flags.
16.8.4 Select Clustering > Calculate cophenetic
correlations.
The cophenetic correlation is shown at each branch (Fig.
16-9), together with a colored dot, of which the color
ranges between green-yellow-orange-red according to
decreasing cophenetic correlation. Thus, it is easy to
detect reliable and unreliable clusters at a glance.
138
16.9
139
1. Click here
2. Click here
K-Means Partitioning
K-means partitioning lets the software automatically
determine Groups with a mathematical function. You
first create Groups based upon one or more strains, such
as type strains. Then, the program automatically
calculates the Group in which it fits best for each entry of
the cluster analysis. This fitting can be based upon
average similarity with the Group, upon the highest
similarity (nearest neighbor), or upon the lowest
similarity (furthest neighbor). Obviously, the partitioning
process must be iteratively executed, since by adding an
entry to a Group, the average similarity of the group as
well as the highest and lowest similarities with entries
may change.
16.10.1 To illustrate the partitioning method, we select
RFLP1.
140
141
, or the
+ and - keys.
16.11.4 When zoomed, the horizontal and vertical scroll
bars allow you to scroll through the page.
16.11.5 The whole image can be enlarged or reduced with
Layout > Enlarge image size
image size
142
16.11.9 You can preview and print the image in full color
with Layout > Use colors or
setup or
, the current
page is printed.
to print all
pages at once.
16.11.14 If you want to export the image to another
software package for further editing, use File > Copy page
to clipboard or
143
16.11.15 Select File > Exit to close the Comparison print
preview window.
144
145
146
147
Fig. 17-1. Fingerprint type window with excluded regions defined (see arrows).
148
149
The position tolerance value is shown (bottom) and is
automatically saved in the settings for the experiment
type.
17.2.8 Close the window with File > Exit.
150
151
152
153
Seq1
Seq2
Seq3
ACTAGTGACTTA
ACAAGGACTTT
GACTAGGACTTA
Unaligned sequences
Seq1
Seq2
Seq3
100
79 100
91 79 100
Homology matrix
Seq1
Seq3
Seq2
Dendrogram
3
Seq1
Seq2
Seq3
100
81 100
91 81 100
Homology matrix
Seq1
Seq2
Seq3
-ACTAGTGACTTA
-ACAAG-GACTTT
GACTAG-GACTTA
Global alignment
Seq1
Seq3
Seq3
Cons(1,3)
Cons((1,3),2)
6
Seq1
Seq2
Seq3
-ACTAGTGACTTA
-ACAAG-GACTTT
GACTAG-GACTTA
100
81 100
91 81 100
Fig. 19-1. Steps in a cluster analysis of sequences: dendrogram based on pairwise alignment (steps 12), and
dendrogram based on multiple alignment (steps 16).
154
19.1
155
19.2
156
19.3
157
button.
19.4
19.5
alignment.
To rearrange the multiple alignment as desired, any
sequence can be moved up or down.
19.4.2 Left-click on the entry you want to move up or
down.
19.4.3 Press the
button to move the entry up, or
the
button to move it down.
19.4.4 To move a sequence to the top or the bottom of the
alignment, hold the CTRL key and press the up or down
button, respectively.
Note that, as soon as an entry is moved up or down, the
dendrogram disappears: a dendrogram imposes a certain
order to the entries, which is not compatible with freely
moving sequences up or down. You can display the
dendrogram again using Layout > Show dendrogram;
however, this will reorder the entries again so that any
manual changes you made to the sequence order is lost.
19.4.5 A number of manual alignment editing tools are
described below. For these editing tools, the multiple
alignment editor contains a multilevel undo and redo
function. The undo function can be accessed with
158
19.6
Result:
Result:
Example:
Result:
Result:
Result:
Result:
Deletes all gaps to the right of, and including the
cursor, by shifting the block to the right of the gap to the
left. Keyboard: SHIFT+DEL.
Inserts gaps at the position of the cursor, by shifting
the block to the right of the cursor position to the right,
until it closes up with the next block. Keyboard:
SHIFT+INSERT.
Example:
Example:
Result:
Result:
19.7
159
19.8
19.9
Finding a Subsequence
160
161
162
Base
numbering
Comment
editor
Excluded
regions (red)
Consensus
sequence
Fig. 19-11. Comparison window with image of aligned sequences in consensus blocks view, detail.
19.13.3 Using Edit > Global alignment settings, the
settings for calculating a global alignment can be edited
(see 19.2).
19.13.4 With Edit > Global alignment comparison
settings, the settings for calculating cluster analysis from
a multiple alignment can be edited, as explained in 19.10.
19.13.5 The menu Edit > Character conversion settings
allows the parameters to be set for converting bases into
categorical characters (see further, 19.15).
19.13.6 Edit > Display settings allows the color and
viewing settings in the multiple alignment editor to be
specified.
19.13.7 The Sequence display settings window (Fig. 19-10)
provides two defaults for color settings: the White
default, which corresponds to the most widely used
colors for the bases on a white background, and the Black
default, which uses a black background in the multiple
alignment editor, using the base color scheme of earlier
versions of the software.
19.13.8 Apart from the two defaults, every item can be
assigned a specific color using the slide bars for the Red,
Green, and Blue components. A character can be chosen
to indicate gaps and consensus positions, respectively.
163
164
165
Fig. 19-13. Comparison window showing a composite data set generated from a sequence type (DemoBase).
Bases were converted into categorical characters and clustered in both directions.
19.16.6 To remove all excluded regions at a time, select
. Enter 1 as the
instead of gray
. The reference sequence is shown in
the Sequence type window (Fig. 19-14).
to
166
Base numbering
Reference sequence
Comment line
Excluded regions
(red)
Fig. 19-14. Sequence type window with reference sequence defined, region excluded, and comments added.
19.16.10 To see the base numbering it may be necessary to
drag the horizontal line that separates the header from
the image panels downwards.
167
168
0
...
67
...
0
...
71
...
63
...
70
...
Sample1
Sample2
0.0
...
1.8
...
0.0
...
2.6
...
1.3
...
2.2
...
100
100
75
100
60
85
100
50
80
90
100
Matrix of Smax
85
100
70
95
100
40
90
80
100
Matrix of MAX
Arithmetic averaging
100
80
100
65
90
100
45
85
85
100
Averaged matrix
UPGMA
Dendrogram
Fig. 20-4. Schematic representation of parameter-based cluster analysis of trend type data. This example,
where two parameters were defined, is a continuation of the processing scheme presented in Fig. 11-2.
The clustering methods are the same as for other
experiment types; see 16.2.
20.2.6 A Trend data window (see 11.4.2 and Fig. 11-9) can
be created from the entries contained in the comparison
with TrendData > Create trend data window.
20.2.7 For a selected parameter, the entries can be sorted
according to increasing value using TrendData > Sort
entries by character.
Note: The separator bar between the parameter
names and the values can be dragged down if the
names are not completely visible.
20.2.8 A tab-delimited text file of the entries and trend
data values contained in the current comparison can be
exported with TrendData > Export character table.
169
4
Experiment 1
Experiment 2
100
75 100
60 85 100
100
85 100
50 70 100
Matrix 1
Matrix 2
Dendrogram 1
Composite experiment
Dendrogram 2
4
3
100
80 100
55 77 100
Combined matrix A
Combined dendrogram A
100
80 100
55 77 100
Combined matrix B
Combined dendrogram B
Fig. 21-1. Scheme of possibilities in InfoQuestFP to obtain combined dendrograms from multiple experiments.
170
171
correlation coefficient, characters with a higher range will
have more influence on the similarity and the
dendrogram. The feature Standardized characters
standardizes each character by subtracting its mean value
and dividing by its standard deviation. The result is that
all characters have equal influence on the similarity.
The feature Use square root is intended for character sets
that yield high similarities within groups. In such cases, it
may be useful to combine Use square root with Pearson
correlation and cosine coefficient (or Euclidean distance
for non-composite data sets).
The Rank correlation coefficient first transforms an array
of characters into an array of ranks according to the
magnitude of the character values. The rank arrays are
then compared using the Pearson product-moment
correlation coefficient. The rank correlation is a very
robust coefficient, but with low sensitivity.
21.2.6 Select Pearson correlation with Standardized
characters and UPGMA as the clustering method.
21.2.7 Close the Composite data set window with File >
Exit. The new composite data set is shown in the
experiment types panel of the Main window.
172
A 100
B 97 100
C 93 98 100
16S rDNA similarity
A 100
B 83 100
C 82 100
173
A 100
B 90 100
C 93 90 100
Averaged composite matrix
Char 1 Char 2
Char p
Entry 1
Val 11
Val 12
Val 1p
Entry 2
Val 21
Val 22
Val 2p
Val np
Entry n
Val n1 Val n2
174
button
175
176
and
, the
You can rotate and swap the branches manually if the tree
layout is not satisfactory.
22.2.14 Left-click in the proximity of a node or a branch
tip.
22.2.15 While holding down the mouse button, rotate the
branch to the desired position.
177
Fig. 22-3. Unrooted maximum parsimony tree. Number of mutations are indicated on the branches (top) as
well as the bootstrap values (bottom).
22.2.16 If you select entries in the parent Comparison
window or in the Main window, these entries are shown
within a square in the Unrooted dendrogram window.
22.2.17 You can also select entries directly in the Unrooted
dendrogram window, by holding the CTRL key while
clicking in the proximity of a node. All entries branching
off from this node will be selected.
22.2.18 Repeat this action to deselect entries.
22.2.19 To copy the unrooted tree to the clipboard, select
File > Copy image to clipboard or
22.2.20 The unrooted tree can be printed with File > Print
image or
178
179
Char 1
Char 2
Char 3
Char 4
Sample 1
x11
x12
x13
x14
Sample 2
x21
x22
x22
x22
Sample 3
x31
x32
x33
x34
Data matrix
Distance/
Similarity
coefficient
Sample 1
100
Sample 2
s12
100
Sample 3
s13
s23
100
Similarity/distance matrix
UPGMA
clustering
Sample 2
Sample 3
Sample 1
Dendrogram
180
Position
tolerance
Coefficient
Coefficient
1.00
1.00
0.75 1.00
1.00 1.00
UPGMA
B & C first
UPGMA
A & B first
UPGMA
B & C first
UPGMA
A & B first
Solution 1
Solution 2
Solution 1
Solution 2
0.75
0.50
1.00
0.75
181
clustering tools in the Comparison window. This
functionality is related to the ability of linking more than
two entries or branches together, as shown in Fig. 23-4. As
such, it becomes possible to display multiple solutions of
a cluster analysis in a consensus representation, as well as
representing two trees from different data sets in one
consensus tree. In addition, each tree obtained using the
advanced clustering tools is automatically saved, which
makes it possible to have more than one stored tree per
experiment type. This feature is useful if you want to
compare trees generated using different similarity
coefficients or using different parameters, such as
position tolerance for banding patterns.
0.50
Solution 1
Solution 2
1.00
0.75
0.50
A
B
C
Consensus
182
183
Fig. 23-6. Advanced tree representation with a highlighted cluster, indication of the number of degenerate
entries relative to the cluster, and the degenerate entry selected.
23.6.1 As an example, we can calculate two dendrograms
in DemoBase: one from experiment Phenotest using the
Pearson correlation and the other from experiment 16S
rDNA. You can calculate the trees using the conventional
clustering tools or using the Advanced Clustering tools.
23.6.2 Select Clustering > Advanced trees > Create
consensus tree, which opens a dialog box listing the Stored
trees (Fig. 23-8).
184
185
186
the
appears, select
.
The Minimum spanning tree dialog box appears as shown
in Fig. 24-1. This dialog box consists of four panels, about
(1) the treatment of Hypothetical types, (2) the Coefficient
to calculate the distance matrix, (3) the Priority rules for
linking types in the tree, and (4) the settings for the
Creation of complexes.
Hypothetical Types
With the checkbox Allow creation of hypothetical types
(missing links), you can allow the algorithm to introduce
hypothetical types as branches of the MST, as described
in 24.2. When enabled, the following criteria can be
specified:
Create only if total distance is decreased with at least
(default 1) changes: Only if the introduction of a
hypothetical type decreases the total spanning of the
tree with one change will the hypothetical type be
accepted.
And if at least (default 3) neighbors have no more than
(default 1) changes: The algorithm will only accept
hypothetical types that have at least 3 neighbors
(closest related types) that have no more than 1 change
(see also 24.2 for the interpretation of this rule).
Coefficient
The choice is offered between Categorical, for categorical
data, Binary, for binary data, and Manhattan. In the latter
option, the sum of the absolute differences between the
values of any two corresponding states is calculated, and
the distances obtained are used to calculate the MST. This
option can be used to cluster non-binary, non-categorical
data with integer values. If non-integer (decimal) values
are used, the program will round them to the closest
integers.
In the Manhattan option, an Offset and a Saturation
value can be specified. For each character compared
between two types, the offset value determines a fixed
distance that is added to the distance of these characters.
If the distance is zero, however, the transformed distance
remains zero. In addition, for each character compared
between two types, the saturation determines the
maximum value the distance can take. In other words,
above the saturation distance, different characters are all
seen to be equally different. The relation between offset,
saturation, and distance of characters is illustrated in Fig.
24-2. The offset and distance can be used to tune the
187
Priority Rules
For equivalent solutions in terms of calculated distance,
the priority rules allow you to specify a priority based
upon criteria other than distance. One or more rules can
be added, with a maximum of 3. The order of appearance
of the rules determines their rank.
Transformed
distance
Saturation
8
7
6
5
4
3
2
Offset
1
1
10
Distance
Fig. 24-2. Graphical representation of the meaning of offset and saturation values.
188
Creation of Complexes
In epidemiological population genetics based upon
MLST, a clonal complex can be defined as a single group of
isolates sharing identical alleles at all investigated loci,
plus single-locus variants that differ from this group at
only one locus.1 In another, more relaxed definition.2,3 a
clonal complex includes all types that differ in x loci or
less from at least one other type of the complex (x is
usually taken as 1 or 2). Under this definition, not all
types of a complex are necessarily SLVs or DLVs from
one another. The latter definition is used in InfoQuestFP.
1. Feil EJ et al., Estimating recombinatorial parameters in
Streptococcus pneumoniae from multilocus sequence typing
data, Genetics 154, 14391450 (2000)
2. Feil EJ et al., How clonal is Staphyloccus aureus? J Bacteriol
185, 33073316 (2003)
3. BURST (Based Upon Related Sequence Types) program
description, see the MLST web site http://www.mlst.net
Display Options
In the Tree panel, each type is represented by one node or
branch tip, displayed as circles that are connected by
branches. In the default settings, but with Letter code
selected under Type labeling, the following information
can be derived from the tree view:
When sufficiently zoomed (using the zoom buttons
and
189
190
Edit Options
24.4.1 With Edit > Display settings or
, the display
191
selected entries.
24.4.17 With Edit > Select related nodes, you can
highlight all the types that have no more than a specified
number of changes from the highlighted type(s). When
button
192
193
, you can
194
the bottom of the cube. This may facilitate the threedimensional perception. Disable this option to view the
next features.
25.2.11 With Layout > Show rendered image or
25.2.15 The image can be printed with File > Print image
or
can toggle between the color representation and the noncolor representation, in which the entry groups are
represented (and printed) as symbols instead of colored
dots.
On the screen, it is easier to evaluate the groups using
colors.
25.2.7 Select an entry coordinate system using
CTRL+click. Selected entries are contained in a blue cube.
25.2.8 To select several entries at a time, hold down the
SHIFT key while dragging the mouse in the coordinate
system. All entries included in the rectangle will be
selected.
, you
195
CHARACTER
ENTRIES
CHAR 1
CHAR 2
CHAR 3
ENTRY 1
VAL 11
VAL 12
VAL 13
ENTRY 2
VAL 21
VAL 22
VAL 23
ENTRY 3
VAL 31
VAL 32
VAL 33
AVERAGE, VARIANCE
AVERAGE,
VARIANCE
Fig. 25-3. Character table showing the meaning of Average and Variance correction at the Entries and
Characters level.
196
The lower panel of the dialog box (Fig. 25-2) displays the
Component type. This can be Principal components,
Discriminants (without variance), or Discriminants
(with variance). The first option calculates a PCA,
whereas the Discriminants options perform discriminant
analysis. These options are described in section 25.6.
25.3.3 In the Entries and Characters panels, check
Subtract average under Characters, and leave the other
options unchecked.
25.3.4 In the Component type panel, select Principal
components, and press <OK>. Calculation of the PCA is
started.
The resulting window, the Principal components analysis
window, is shown in Fig. 25-4.
The window is divided in two panels: the left panel
shows the entries plotted in an x-y diagram corresponding
to the first two components. In the caption of the window,
197
Layout Tools
If you started the PCA from a composite data set, you can
order the characters according to the selected component
in the underlying Comparison window. This is an
interesting feature for locating characters that separate
groups you are interested in. The feature works as follows
(only for composite data sets).
, you
Editing Tools
25.3.12 Entries can be selected in a PCA window by
holding the SHIFT key down and selecting the entries in a
rectangle using the left mouse button. Selected entries are
encircled in blue. You can also hold down the CTRL key
while clicking on an entry.
25.3.13 An even more flexible way of selecting entries is
using the lasso selection tool. To activate the lasso
selection tool, choose Layout > Lasso selection tool or
press the
button. With the lasso selection tool
enabled, selections of any shape can be drawn on the plot.
The lasso selection tool menu and button are flagged
25.3.19 The entry plot can be printed with File > Print
image (entries) or
198
Group 1
Group 2
Character A
Character B
199
200
25.5.5 You can (de)select entries on the SOM by leftclicking on an entry while pressing the CTRL key, or
groups of entries by left-clicking and moving the mouse
while pressing the SHIFT key.
Note: When a SOM is calculated on fingerprint type
data, the densitometric curves are used as character
data sets for training of the SOM.
A SOM is automatically saved along with its parent
Comparison window. It is possible to add entries to an
existing SOM or remove entries from it. The feature to
add entries to an existing SOM is an interesting
alternative way of identifying new entries. Added entries
are placed in a frame of known database entries in the
SOM, and in this way, identification is just looking at the
groups they are joining.
25.5.6 If you want to add entries to an existing SOM, you
can select new entries in the Main window and copy them
to the clipboard using Edit > Copy selection or
Since MANOVA and discriminant analysis work on userdelineated groups, the comparison should contain groups
(see 16.6).
201
202
C
Fig. 25-8. MANOVA & discriminant analysis window. The circles delineating groups A, B, and C are added to
this figure to illustrate the interpretation of discriminant analysis.
203
204
205
Introduction
26.2
Basic Terminology
26.2.1 Literature
This manual is not aimed to be an introduction to basic
statistics. For more detailed literature, refer to the
following handbooks:
Press W et al., Numerical recipes in C: the art of
scientific computing, 2nd, Cambridge University
Press, Cambridge (1992)
Sheskin DJ, Handbook of parametric and
nonparametric statistical procedures, 3rd, CRC Press,
Boca Raton (2004)
Zwillinger D and Kokoska S, Standard probability and
statistics tables and formulae, Chapmann & Hall/CRC
Press, Boca Raton (2000)
26.2.2 Application of Statistical Tests
In general terms, the application of a statistical test can be
outlined as follows:
Make a proposition that will be referred to as the null
hypothesis. Statistical tests cannot be employed for
proving that a certain hypothesis is true, but only for
proving that all alternative hypotheses can be rejected.
206
One variable:
Categorical
Quantitative
---
Categorical
Quantitative
Table 26-1. Schematic representation of variable types and corresponding graphs and tests for one and two
variables.
Parametric
Non-parametric
Means
t test (26.3.5.1)
Correlations
2 categories
Parametric
Non-parametric
t test (26.3.6.1)
>2 categories
207
Parametric
Non-parametric
F test (26.3.6.3)
26.3
s = 100 (1 p ) .
the p-value,
2 = [N oi N e ]2 N e
i =1
of categories.
If the null hypothesis is true and under certain conditions
(see the note below), this statistic approximately follows a
chi square distribution with s-1 degrees of freedom. The
p-value that is returned gives the probability that the
statistic is at least as high as the observed one. If the
nj ( nj 1 )
j=1
208
category is given by P =
Pi
i=1
H =
i=1
n i n i
---- ln ---N N
n = ni n j , with ni the
([N
ni , n j
i =1, j =1
oij
nij
nij
N ni n j + 1 degrees of
s = 100 (1 p ) .
V = 2 N min (ni 1, n j 1)
[N
oij
nij
and is calculated as
209
nij
p-value,
s = 100 (1 p ) .
(x
i =1
x )
and n the sample size) are calculated from the sample and
are used to determine a normal distribution that can be
used as a model (further referred to as model normal
distribution) for the underlying distribution of the sample
if the null hypothesis holds.
210
Cov( x, y ) = ( xi x )( y i y )
i =1
(n 1) , is
sd =
(s
+ s y 2Cov( x, y ) n .
A statistic is defined as
T = ( x y ) s d
. If the null
s = 100 (1 p ) .
Mean values:
c10
1.3396
c14
1.8512
Corrected variances:
c10
0.2870
c14
0.4246
Corrected covariance = -0.1476
Pooled corrected standard deviation = 0.1464
s x = ( x i x )
(n 1) and
i =1
s y = ( y i y )
i =1
211
d i = xi y i
are
n(n 1) 4
(T [n(n 1) 4])
n(n + 1)(2n + 1) 24
s = 100 (1 p ) .
s x = ( x i x )
and
i =1
s y = ( y i y )
i =1
Cov(x, y ) = ( xi x )( y i y ) n
i =1
the
r = Cov( x, y )
sx sy
n2
1 r2
approximately follows a t
is
s = 100 (1 p ) .
Mean values:
c10
1.3396
c14
1.8512
Variances:
c10
0.2809
c14
0.4156
Covariance= -0.1445
Pearson correlation= -42.288%
P value (single tail)= 0.001531 (T test
approximation)
Significance= 99.8469%
212
s = 100 (1 p ) .
= Cov ( R, S )
coefficient is defined as rs
n
s R = (Ri R )
sR sS
and
, with
i =1
n
s S = (S i S )
the rank
i =i
rs n 2
1 rs
rs
s = 100 (1 p ) .
Significance= 99.8325%
i =1
213
0.08509
T = 1.891 (37 degrees of freedom)
P value= 0.066419
Significance= 93.3581%
A statistic is defined as
T = ( x y ) s d
(n + m 2)
s = 100 (1 p ) .
Mean values:
Ambiorix
2.552
Perdrix
2.391
Pooled corrected standard deviation =
nm 2
and variance
s = 100 (1 p ) .
Sum of ranks:
Ambiorix 509.5
Perdrix 270.5
P value= 0.157560 (Normal approximation)
Significance= 84.2440%
Fig. 26-14. Example of a test report for a MannWhitney test applied on an ANOVA plot with two
categorical variables such as the one shown in Fig.
26-12.
Note: This test should not be used if one of the
groups contains fewer than 8 members.
214
s = 100 (1 p ) . The
sum of squares
ni
gives
i =1 j =1
the p-value,
s = 100 (1 p ) .
SST=
SSA=
SSW=
means are
x groupi = xij ni
j =1
is
x = x groupi g ,
i =1
ni
SST = (xij x )
g
, is
i =1 j =1
SSA = ni ( x groupi x )
g
measures the
i =1
3.347
0.255
3.092
s = 100 (1 p ) . The
215
If the sample contains fewer than 30 observations, an
alternative way for testing the null hypothesis is offered
by Monte-Carlo simulations. To do this, 10,000 samples
with g groups and n1, n2, , ng randomly distributed
observations in the groups are created. For each of these
samples, a value for the H statistic is obtained and is
compared to the observed value. The p-value from the
simulations is determined by the number of times the
simulations give a larger value for the H statistic than the
one observed in the real sample. Also here, the
significance is calculated as
s = 100 (1 p ) . The
A statistic is defined as
g
12
Ri
H =
3(n + 1) .
n(n + 1) i =1 ni
s = 100 (1 p ) .
Fig. 26-17. Example of a test report for the KruskalWallis test applied to an ANOVA plot with more
than two categorical variables such as the one shown
in Fig. 26-15.
Note: (1) If there are only 3 groups, this test should
not be used if one of the groups contains fewer than
6 observations.
Note: (2) If there are more than 3 groups, this test
should not be used if one of the groups contains
fewer than 5 observations.
26.4
The plot and statistics tools are available directly from the
Main window or from the Comparison window. In the
Main window, it can be started using Comparison >
Chart / Statistics. When launched from the Main
window, it works on the current selection made in the
database. If launched in the Comparison window, it works
on all entries contained in the comparison.
26.4.1 In the Comparison window, click the
button or select File > Chart > statistics. This opens a
dialog box (see Fig. 26-18) that is used to select the plot
components. All components that can be included in a
chart are listed on the left.
26.4.2 To add a component to the chart, select a
component from this list by clicking on it and add it to the
list of Used components (displayed at the right) with the
button <Add>. Also in this list, components can be
clicked to select them. The selected component can be
removed from the Used components list with the button
<Delete>. For the selected component, the panel beneath
the Used components list displays what data type it is.
26.4.3 Within this Select plot components dialog box you
can convert a quantitative variable into an interval
variable by checking the Convert to interval data checkbox.
The interval size must be specified. See lower right part of
the panel displayed in Fig. 26-18. The same procedure
must be followed if a data variable has to be converted to
an interval variable.
26.4.4 For this example, select one numerical variable.
After clicking the <OK> button, the chart appears (Fig.
26-19). In this section the general features and
appearances of the Chart and Statistics window are
discussed. The content of the plot will be discussed in
sections 26.526.11.
216
Fig. 26-18. The Select plot components dialog box that appears when the chart tool is started; it is used to select
the plot components for the chart.
26.4.5 To copy the plot of this window select either File >
Copy to clipboard (metafile) or File > Copy to clipboard
(bitmap). A paper copy can be obtained by selecting File >
Print.
26.4.6 For some type of charts, you can export the data
by selecting File > export data (formatted) or File > export
data (tab delimited). These menu items appear in gray
instead of black if they cannot be applied for the current
type of chart.
217
26.4.7 To change the content of the chart you can use the
Plot menu item. Selecting Plot > Edit components opens
the Select plot components window (see Fig. 26-18). This
can be used to change the Used components. If the list of
Used components is modified, it is possible that the plot
changes into another type of chart because the chart
functionality selects the optimal representation for a
given set of variables. Of course it is possible to select
another type of chart (see 26.4.8).
26.5
26.4.8 In the Plot menu item you can also select another
type of chart. If the chart type chosen is not compatible
with the data type, the message Invalid type of source
data appears.
in the
Comparison window.
26.5.2 Select a categorical variable, such as an
information field, and add it to the list of Used components,
then press <OK>.
26.6
Edit plot components
Bar Graph
Contingency Table
or within the
218
Fig. 26-20. A bar graph for one categorical variable in the Chart and Statistics window.
Fig. 26-21. Statistics report for chi square test for equal category sizes.
219
Fig. 26-22. A contingency table for two categorical variables in the Chart and Statistics window.
26.6.5 The contingency table can be displayed in the
Chart and Statistics window showing residuals in the cells,
with View > Display residuals. The residual for a cell is a
measure for the deviation from the expected number of
counts in that cell and is calculated as
[N
oij
nij
nij
26.7
Fig. 26-23. Statistics report for the chi square test for contingency tables.
220
Fig. 26-24. 2-D scatter plot for two quantitative variables in the Chart and Statistics window.
Comparison window by clicking
or within the
221
function, or by clicking the
button. For
26.9
ANOVA Plot
or
26.8
or within the
button. Select
222
Fig. 26-27. A 1-D numerical distribution function for a single quantitative variable.
223
224
225
27.2.2 Select all Ambiorix sp. entries (in the Entry search
dialog box, disable Search in list and Negative search and
enter Ambiorix in the Genus field and sp. in the species
field).
For large databases of fingerprint patterns, the most timeconsuming part of a quick database screening of new or
unknown patterns is reading or downloading all the
fingerprint information. InfoQuestFP offers a tool that
overcomes this bottleneck by generating a cache
226
227
228
Motility
at RT (1)
Glycerol
PWS (2)
Inositol
PWS (3)
Oxidase
(4)
Vercingetorix aquaticus
95
Vercingetorix nemorosum
99
99
Vercingetorix palustris
99
85
20
Vercingetorix maritimus
50
95
15
Vercingetorix viridis
94
97
15
229
Table 2. Example input data for probabilistic identification matrix Prob_Id.xls (see text).
Strain
Glycerol PWS
Inositol PWS
Mannitol PWS
Sorbitol PWS
Oxydase
42815
42816
42853
230
231
232
233
234
- BLOSUM62_20: 100/10
- BLOSUM80: 10/1
- BLOSUM90: 10/1
- PAM30: 9/1
- PAM70: 10/1
- PAM250: 14/2
Under Sequence type you can select the Sequence type
to use for the BLAST search, when more than one
sequence type is available.
The BLAST database can be chosen with the <Browse>
button. As a BLAST database consists of multiple files,
(.nhr, .nin, .nsq), only one of the files should be
selected.
235
This view also shows a small plot for each HSP between
the query and hit sequence in the upper right corner.
With View > Fixed scale disabled, the two sequences are
not necessarily drawn proportionally.
27.7.14 The entire report can be exported as an XML file
or printed with File > Export or File > Print, respectively.
236
237
28.1.9 Paste the entries in the library unit with Edit >
Paste selection.
28.1.10 Save the library unit with File > Save.
238
239
240
Application
A neural network can be applied to many problems, such
as control theory, character recognition, statistical
analysis, and distinguishing patterns. In practice, a neural
network is very useful to set up an identification or
recognition system based upon complex data sets in
which it is not easy or is impossible to identify
discriminatory keys based upon conventional methods
241
242
243
The Create new bundle dialog box (Fig. 29-1) lists the
available database information fields in the left panel and
all available experiment types in the right panel.
You can check each of the database information fields and
experiment types to be incorporated in the bundle. For
fingerprint types, the fingerprint images, band
information, and densitometric curves can be
incorporated separately.
244
245
29.2.7 You can select all the entries from an opened
bundle by pressing the <Select entries> button in the
Open/close bundle dialog box.
29.2.8 To close a loaded bundle, select it in the list and
press the <Close> button.
29.2.9 Press the <Exit> button to close the Open/close
bundle dialog box.
Note: If you want a bundle to always be opened
with the database when InfoQuestFP is started up,
you should rename it to contain the prefix @_ before
its name.
246
247
248
to create a new
249
250
251
31.2.4 By pressing the <Refresh> button, the connection
between InfoQuestFP and the connected database is
refreshed. A tree-like table structure view of the database
is displayed in the upper right panel.
31.2.5 The database type can be selected under Database
(Access, SQL Server, and Oracle). This information is
written under [DATABASETYPE] in the connection
description file.
The second panel in the Connected database configuration
dialog box concerns the tables of the connected database.
InfoQuestFP assumes a certain table structure to be able
to store its different kinds of information. This table
structure is described in 33.1. The default table names are:
ATTACHMENTS for attachments (see 6.5)
ENTRYTABLE for the entries
EXPERIMENTS for the experiments
FPRINTFILES for the fingerprint files
FPRINT for the fingerprint lanes
SEQUENCES for the sequences
252
253
directory can be modified as described in 31.2. The
path can be a network path, for example on a server
computer.
Log files are stored in a different way in a connected
database. The log events are stored in a database table
called EVENTLOG. Different events are stored under
different categories: the category Database concerns all
actions affecting the database (adding, changing or
removing information fields, adding experiment types,
adding entries, changing entry information, etc.).
Furthermore, there is a category
EXPER_<ExperimentName> (<ExperimentName>
being the name of the experiment), relating to changes
made to the experiment type (i.e., normalization
settings for fingerprint type, adding, removing, or
renaming characters in character type, etc.). A third
category reports on changes made to the data in a
certain experiment type. In this category, components
have the name of the experiment type.
The Event log window (Fig. 5-3) called from the main
program offers the ability to view the log file for a
connected database or the local database under
Database. Under Component, you can choose to view
a specific component, such as a database, an
experiment type, or data belonging to an experiment
type. With All, you can view all components together,
listed chronologically. The components can only be
selected when a connected database is viewed.
254
255
31.5.16 Finally, when all links to existing database tables/
views are made correctly, you can allow InfoQuestFP to
create additional tables for which there are no fields
available in the external database, by pressing <Auto
construct tables>. InfoQuestFP will now only construct
tables that are not yet linked, and fields that are not yet
present in the connected tables.
Note: It is not possible for InfoQuestFP to create
new fields within a view/query. In that case, you
will have to create the field in Oracle, SQL Server, or
Access, add it to the view, and reload the
InfoQuestFP database.
256
257
In addition, when new entries are added to the database,
they will automatically have their field FieldName filled
with String.
As an example, suppose that the database DemoBase has
been converted to a connected database; you can enter a
restricting query to visualize only Ambiorix as follows.
31.8.1 In the Connected database configuration dialog box
under Restricting query, type:
GENUS=Ambiorix
258
Database Field
Using this component button, you can enter a (sub)string
to be found in any specific field that exists in the database
(Fig. 31-5). Note that wildcards are not used in this query
tool and that the string entered must match exactly with
the field contents. The queries are not case sensitive.
Subset Membership
With this search component, you can specify that only
entries belonging to a certain subset should be loaded
(Fig. 31-7). This option offers additional flexibility, as
subsets can be composed of any selection of database
entries and are not necessarily bound to global query
statements.
259
Note: The buttons for the logical operators contain a
helpful Venn diagram icon that illustrates the
function of the operator.
Note: An example on the use of the logical
operators is given in 14.4 for the graphical query
builder.
Note that:
Individual components can be reedited at any time by
double-clicking on the component or by selecting them
and pressing <Edit>.
Logical operators
NOT operates on one component. When a
component is combined with NOT, the condition of the
component will be inverted.
260
Fig. 31-8. SQL query statements translated from a visual query build.
statements translated from the active query (Fig. 31-8).
These SQL statements are passed on to the database to
obtain the restricted view.
In principle, you can compose queries or make changes
directly in these fields. This is, however, not
recommended unless you are very familiar with both the
SQL language and the InfoQuestFP database table
structure. Incorrect SQL query inputs can lead to
information not being downloaded fully from the
database and may corrupt the database if attempts are
made to save changes.
When a database is opened with a restricting query, an
analysis containing entries that are not loaded in the
current view may be performed. This can happen with
gel files, comparisons, subsets, or library units. If such a
situation occurs, the program will first generate an error
message that one or more keys are not present or not
loaded in the database. Next, the program will propose
trying to fetch the entries from the database. If you
answer <Yes> the entries will be loaded dynamically
from the connected database. For a gel file or a subset,
this can technically be achieved very quickly. For a
comparison, however, the operation requires an SQL
command to be launched for each additional entry to
download. For large numbers of additional entries, a
large database, and several other factors, this may take
261
262
263
Automatically rename the duplicate keys into unique
strings: A new conflict will arise because this will
create experiments with keys that do not correspond to
any database entry. However, this can be solved by
automatically creating new database entries for these
experiments.
264
265
33. Appendix
33.1 Connected Database Table Structure
33.1.1 Introduction
In the description below, the structures of the tables
required by InfoQuestFP in a connected database are
given (see chapter 31). The tables are indicated with their
default names. As pointed out in 31.5, however, it is
possible to use different names for these tables or views in
an actual database, which are recorded in the connected
database configuration file (.xdb). The names of the
columns within the tables, however, are fixed.
The object CLOB means a large text field. This may be
described differently depending on the database used
(e.g., the Access equivalent is memo).
NULL values should be allowed for all fields.
33.1.2 Table ENTRYTABLE
This table contains a record for every entry in the
database.
KEY (VARCHAR(80))
The unique identifier for every entry in the database
(e.g., isolate number).
Other fields: additional database information fields.
33.1.3 Table EXPERIMENTS
This table contains a record for every experiment type in
the database.
EXPERIMENT (VARCHAR(80))
Holds the name of the experiment (should be unique
through the whole database).
TABLES (VARCHAR(160))
Used for character experiments only: holds the name of
the tables that hold character values and additional
character fields (separated by a comma).
33.1.4 Table FPRINTFILES
This table contains a record for every batch of fingerprints
that is entered in the database. A batch may correspond to
fingerprints that should be normalized simultaneously:
e.g., they were run on the same electrophoresis gel, or run
in the same batch on a sequencer, etc.
FILENAME (VARCHAR(80))
The name of the batch (should be unique for every batch).
For scanned electrophoresis gels, this corresponds to the
name of the TIFF image file.
EXPERIMENT (VARCHAR(80))
Name of the experiment type to which this fingerprint
batch belongs.
LOCKED (VARCHAR(10))
Whether or not this batch is locked (Yes or No).
INLINELINK (VARCHAR(80))
If this batch is linked to another batch (for
normalization purposes), this specifies the name of the
batch that contains normalization info.
SETTINGS (VARCHAR(250))
Data processing settings.
TONECURVE (VARCHAR(200))
TYPE (VARCHAR(80))
Can be Fingerprint, Character, or Sequence.
SETTINGS (CLOB)
XML string that holds the processing, visualization,
and analysis settings of the experiment type.
266
BANDS (CLOB)
Holds information about the bands assigned on the
fingerprint.
REFPOS (VARCHAR(250))
Contains the reference positions assigned to this
fingerprint.
MAPFORWARD (CLOB)
Contains a forward normalization vector.
MAPBACK (CLOB)
Contains the reverse normalization vector.
REFSYSTEM (CLOB)
Holds the reference system of the fingerprint.
TONECURVE (VARCHAR(250))
Contains the tone curve.
CHPTRN (VARCHAR(250)) (only with Fast band
matching enabled)
Contains cached pattern information on the band
positions for a fingerprint type with Fast band
matching enabled.
33.1.6 Character Values Table
Each character type has its own table holding character
value information for the database entries. The default
name of this table is the name of the character type,
although it is possible to specify any table name (the exact
name is contained in the TABLES column of the
EXPERIMENTS table). Each record in the table
corresponds to a single character value belonging to a
single entry in the database.
KEY (VARCHAR(80))
Key of the entry to which this character value belongs.
CHARACTER (VARCHAR(80))
Key of the character.
VALUE (FLOAT)
Numerical value.
267
CHARACTER (VARCHAR(80))
Name of the character to which this information field
belongs.
FIELD (VARCHAR(80))
Name of the field.
CONTENT (VARCHAR(150))
Content of the field.
33.1.8 Table SEQUENCES
This table holds the sequence information stored in the
database. Note that the columns designed for contig files
have changed with respect to earlier versions of the
software.
KEY (VARCHAR(80))
Key of the database entry to which this sequence
belongs.
EXPERIMENT (CHARCHAR(80))
Experiment type of the sequence.
SEQUENCE (CLOB)
Sequence data.
CONTIGFILE (VARCHAR(80)
Unique ID of the contig file that is associated with this
sequence (if any).
CONTIGFILE (VARCHAR(80))
Unique ID of the contig that is associated with this
sequence trace file.
TRACEID (VARCHAR(80))
Unique ID of the trace file.
DATA (CLOB)
Holds the full trace information, including sequence,
and the chromatogram files if the trace files are stored
in the database (31.2). Otherwise, it stores a link to the
path of the trace file.
INFO (CLOB)
Contains the full editing information of the sequence
trace file.
33.1.10 Table MATRIXVALS
Holds pairwise similarity values. Each record in this table
represents a single similarity value between two database
entries.
EXPERIMENT (VARCHAR(80)).
Name of the experiment type this similarity value
belongs to.
KEY1 (VARCHAR(80))
Key of the first database entry.
KEY2 (VARCHAR(80))
Key of the second database entry.
VALUE (FLOAT)
Similarity value.
CONTIG (CLOB)
Holds the contig sequence and its full editing history.
CONTIGSTATUS (VARCHAR(10)
Contains the status of the contig file, i.e., confirmed or
not.
268
SUBJECT (VARCHAR(50))
Database component for which this event was
generated.
DESCRIPTION (VARCHAR(500))
KEY
MATRIXVALS
EXPERIMENT,KEY1,KEY2
SUBSETMEMBERS
SUBSET
269
`[:upper:]'
Any one of `A B C D E F G H I J K L M N O P Q R S T U
V W X Y Z'.
`[:xdigit:]'
Any one of `a b c d e f A B C D E F 0 1 2 3 4 5 6 7 8 9'.
For example, `[[:alnum:]]' means `[0-9A-Za-z]', except the
latter form is dependent upon the ASCII character
encoding, whereas the former is portable. (Note that the
brackets in these class names are part of the symbolic
names, and must be included in addition to the brackets
delimiting the bracket list.) Most metacharacters lose their
special meaning inside lists. To include a literal `]', place it
first in the list. Similarly, to include a literal `^', place it
anywhere but first. Finally, to include a literal `-', place it
last.
The period `. matches any single character. The symbol
`\w' is a synonym for `[[:alnum:]]' and `\W' is a synonym
for `[^[:alnum]]'.
The caret `^' and the dollar sign `$' are metacharacters
that respectively match the empty string at the beginning
and end of a line. The symbols `\<' and `\>' respectively
match the empty string at the beginning and end of a
word. The symbol `\b' matches the empty string at the
edge of a word, and `\B' matches the empty string
provided it's not at the edge of a word.
A regular expression may be followed by one of several
repetition operators:
`?'
`[:lower:]'
Any one of `a b c d e f g h i j k l m n o p q r s t u v w x y
z'.
`*'
The preceding item will be matched zero or more
times.
`[:print:]'
Any character from the `[:space:]' class, and any
character that is *not* in the `[:graph:]' class.
`[:punct:]'
Any one of `! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^
_` { | } ~'.
`[:space:]'
Any one of `CR FF HT NL VT SPACE'.
`+'
The preceding item will be matched one or more times.
`{N}'
The preceding item is matched exactly N times.
`{N,}'
The preceding item is matched N or more times.
`{N,M}'
270
271
Index
A
Add (Netkey) 19
Add array of characters 66
Add color 66
Add new entries 28
Add new experiment file 67, 80
Advanced query tool 110
Align external branch 161
Align internal branch 161
Amino acid sequences 79
Analysis 21
Analyze 7, 23, 27, 28, 109
Area sensitive (coefficient) 132
Arithmetic average 44
Arrange by similarity 225
Assembler 81
Attachment 110
Auto construct tables 252
Auto create (.mdb) 250
Average (K-means) 139
Average similarities (jackknife) 140
Average thickness 59
Averaging thickness (curves) 43
B
Background subtraction 35, 44
Background subtraction (2-D image) 41
Background subtraction (BNIMA) 70
Ball size 70
Band class filters 125
Band classes > Add new band class 124
Band classes > Assign band to class 124
Band classes > Auto assign bands to class 124, 125
Band classes > Center class position 125
Band classes > Remove band class 124
Band classes > Remove band from class 125
Band finding (settings) 48
Band height 127
Band matching 121
Band search filters 48
Band search, shoulder sensitivity 49
Band surface 127
Bandmatching 118
Bandmatching > Auto assign all bands to all classes 128
Bandmatching > Band class filter 126
Bandmatching > Comparative Quantification settings 127
Bandmatching > Export bandmatching 125
Bandmatching > Perform band matching 122, 128, 129
Bandmatching > Polymorphic bands only (for selection
list) 129
Bandmatching > Polymorphic classes only (for selection
list) 125
Bandmatching > Search band classes 125
Bands 38, 50, 147
Bands (assigning) 48
Bands > Add new band 49
Bands > Auto search bands 49
Bands > Delete selected band(s) 50
Bands > Mark band(s) as certain 50
Bands > Mark bands as uncertain 50
Binary coefficient 171
Binary coefficients 151
Bitmap export 143
BLAST 232, 233
BNIMA 70
Bootstrap analysis 137
Build (connected databases) 251
Bundles 243
Bypass normalization 54
C
Calculate > Experiment correlations 145
Calculate > Similarity plot 146
Calculate quality quotients 238
Calibration curve 62
Canberra metric 151
Case sensitivity 110
Categorical coefficient 151, 172
Cells > Add disk to mask 72
Cells > Add pixels to mask 72
Cells > Add selected 71
Cells > Edit color scale 72
Cells > Remove pixels from mask 72
Change access (Netkey) 19
Change entry key 28
Change towards end of fingerprint 122, 132
Changing fingerprint type 57
Changing sequences in a multiple alignment 159
Character > Change character range value 65
Character file, new 67
Character types 7, 65, 106
Character value (query) 110
Characters > Add new character 65, 67
Characters > Order characters by component 197, 199
Characters > Use character for comparisons 66
Check table structure 252
Cluster analysis 169
Cluster analysis (similarity matrix) 131, 135, 138, 148,
151, 154, 171, 193, 200
Cluster cutoff method 138
272
Curves 38, 60
Curves > Spectral analysis 44
D
Database > Add all lanes to database 57
Database > Add lane to database 57
Database > Add new entries 27, 54, 56, 253
Database > Add new information field 28, 29
Database > Change entry key 27
Database > Change fingerprint type of lane 57
Database > Connected databases 250
Database > Link lane 56
Database > ODBC link > Configure external database link
247
Database > ODBC link > Copy from external database 247
Database > ODBC link > Download field from external
database 248
Database > ODBC link > Select list from external database
248
Database > Remove all links 57
Database > Remove entry 28
Database > Remove information field 29
Database > Remove link 57
Database > Remove unlinked entries 28
Database > Rename information field 29
Database directory 22
Database field (query) 110, 258
Database field range (query) 110, 258
Database fields 80
Database settings 22
Database sharing tools 243
Databases 9
Degree (congruence of techniques) 145
Delete (Netkey) 19
Delete database 22
DemoBase 15, 23, 27
Dendrogram 116, 119, 131, 132, 134, 153
Densitometric curves 43, 59
Densitometric values (BNIMA) 70
Details (bundle) 244
Dice 132, 148, 151
Different bands (coefficient) 132
Dimensioning > Multi-dimensional scaling 193
Dimensioning > Principal Components Analysis 194, 198
Dimensioning > Self organizing map 199
Discard unknown bases 155, 160
Disconnect (Netkey) 19
Discriminants (with variance) 198
Discriminants (without variance) 198
Divide by variance 198
Divide by variance (PCA) 195
DNS Configuration 18
DNS host name 18
Do not create keys 80
Drag-and-drop sequence alignment 157
Duplicate keys 56
Index
Dynamical preview 38, 59
E
Edit > Arrange entries by database field 118
Edit > Arrange entries by field 31
Edit > Arrange entries by field (numerical) 31
Edit > Arrange entries by similarity 225
Edit > Bring selected entries to top 109, 129
Edit > Change brightness & contrast 38, 42, 59
Edit > Change key 261
Edit > Clear selection list 109, 110, 115
Edit > Copy selection 115, 119, 134, 237
Edit > Cut selection 115, 118, 119, 125, 134, 161, 176
Edit > Delete current (subset) 116
Edit > Delete selection 116
Edit > Edit tone curve 42
Edit > Freeze left pane 31, 119
Edit > Load default settings 53
Edit > Paste selection 115, 118, 119, 125, 134, 161, 225,
237
Edit > Previous page 141
Edit > Redo 38
Edit > Rename current (subset) 116
Edit > Rescale curves 45, 52
Edit > Save as default settings 53
Edit > Search entries 51, 109
Edit > Set database field length 31
Edit > Settings 40, 43, 44, 48
Edit > Settings (BNIMA) 70, 72, 73, 74
Edit > Settings (fingerprints) 45
Edit > Show value scale (BNIMA) 70
Edit > Undo 38
Edit > Zoom in 38, 141
Edit > Zoom out 38, 141
Edit database fields 199
Edit image (BNIMA) 70
EMBL format 80
Enable log files 23
Enable the use of log files 22
Enhanced metafile export 142
Entries > Add new entries 67, 80
Error flags 137
Estimate errors 178
Estimate relative character importance 201
Euclidean distance 151
Experiment 27, 54, 55, 122
Experiment > Comparison settings 171
Experiment > Correct for internal weights 170, 172
Experiment > Train neural network 241
Experiment > Use for identification 237
Experiment > Use in composite data set 122, 170
Experiment card 66, 105, 106
Experiment presence (query) 110
Experiments > Create new character type 65
Experiments > Create new composite data set 121, 170
Experiments > Create new fingerprint type 35
Experiments > Create new matrix type 103
273
Experiments > Create new sequence type 79, 96
Experiments > Edit experiment type 53, 65
Export band metrics 105
Export normalized band positions 105
Export normalized curve 105
F
Fields > Add new field 68
Fields > Remove field 68
Fields > Rename field 68
Fields > Set field content 68
Fields > Use as default field 68
File > Add experiment file 253
File > Add image to database 36
File > Add new experiment file 35
File > Add new library unit 237
File > Approved 90
File > Calculation priority settings 133
File > Clear log file 23
File > Convert complexes to groups 191
File > Copy correspondence plot to clipboard 203
File > Copy discriminants to clipboard 203
File > Copy image to clipboard 177, 194
File > Copy image to clipboard (characters) 197
File > Copy image to clipboard (entries) 197
File > Copy page to clipboard 142
File > Create new bundle 243
File > Delete experiment file 37, 68, 81
File > Edit character data 107
File > Edit fingerprint data 107
File > Edit library unit 237
File > Edit sequence data 107
File > Exit 27
File > Export 227, 228
File > Export bands (comparison) 118
File > Export character coordinates 198
File > Export database fields 137, 138, 176, 194, 197, 225
File > Export densitometric curves (comparison) 118
File > Export entry coordinates 198
File > Export report to file 239
File > Export sequences 162
File > Export similarity matrix 138
File > Import experiment data 79
File > Import experiment file 103
File > Import from external database 248
File > Link to reference gel 59
File > Load configuration 74
File > Load image (BNIMA) 70
File > Lock 22
File > Open additional database 247
File > Open bundle 243
File > Open experiment file (data) 37, 59, 67, 80
File > Open experiment file (entries) 54, 56, 68, 80
File > Open reference gel 59, 60
File > Print all pages 142
File > Print correspondence plot 203
File > Print database fields 225
274
G
Gap penalty 155, 160
Gelstrip thickness 59
GeneScan tables, importing 60
Genus 136, 225
Global alignment 154
Gower 151
Gray zone (bands) 49
Grid > Add new 71
Grid > Delete 71
Grid > Delete seleceted 71
Grid definition 70
Group > Create from database field 136
Group > Partitioning of groups 139
Group separation statistics 140
Group violations 140, 141
Groups 136
Groups > Assign selected to 135
Groups > Assign selected to > None 139
Groups > Create from database field 148
Groups > Group separations 140
Groups > Multivariate Analysis of Variance 201
Groups > Partitioning of groups 140
H
Hidden nodes 241
Hierarchical tree 131
Home directory 9, 21
Homedir 18
Hue only (BNIMA) 70
I
ID code 22, 23
Identification 225, 237
Identification > Create new library 237
Identification > Fast band matching 226
Identification > Identify selected entries 238
Identification against database entries 225
Idle time background 133
Image > Convert to gray scale > Averaged 37
Image > Convert to gray scale > Blue channel 37
Image > Convert to gray scale > Green channel 37
Image > Convert to gray scale > Red channel 37
Image > Invert 36
Image > Load from original 37
Image > Mirror > Horizontal 37
Image > Mirror > Vertical 37
Image > Rotate > 180 37
Image > Rotate > 90 left 37
Image > Rotate > 90 right 37
Image type (BNIMA) 70
Import using ODBC 247
Info 18
Inserting and deleting gaps in multiple alignment 158
Inspect 21
Install InfoQuestFP 15
Install Netkey server program 17
Internal reference markers 60
IP address 18, 19
J
Jaccard 131, 151
Jackknife 140
Jeffreys X 132
Jukes and Cantor 155, 160
K
Kendalls tau 145
Kimura 2 parameter 160
K-means partitioning 139
Kohonen map 199
L
Lane|Move down 59
Lane|Move up 59
Lane|Remove 59
Lanes > Add marker point 60
Lanes > Add new lane 42
Lanes > Auto search lanes 40, 59
Lanes > Copy geometry 60
Index
Lanes > Delete selected lane 42
Lanes > Paste geometry 60
Layout 193
Layout > Compress (X dir) 123
Layout > Create rooted tree 177
Layout > Display experiments 117
Layout > Enlarge image size 141
Layout > Optimize branch spread 176
Layout > Preserve aspect ratio 197
Layout > Reduce image size 141
Layout > Rescale curves 118
Layout > Show 3D plot 196
Layout > Show bands 118
Layout > Show branch lengths 176, 178
Layout > Show construction lines 194
Layout > Show curves as images 118
Layout > Show dendrogram 154, 194
Layout > Show densitometric curves 118
Layout > Show distances 135
Layout > Show group colors 176, 194, 197
Layout > Show image 122, 154
Layout > Show keys 193, 197
Layout > Show keys or group numbers 176
Layout > Show matrix 138, 154
Layout > Show matrix rulers 138
Layout > Show metric scale 118, 123
Layout > Show rendered image 194
Layout > Show similarity matrix 141
Layout > Show similarity values 138
Layout > Show space between gelstrips 117, 143
Layout > Similarity shades 138
Layout > Stretch (X dir) 123
Layout > Use colors 142
Layout > Use component as X axis 197
Layout > Use component as Y axis 197
Layout > Use component as Z axis 197
Layout > Use group numbers as key 197
Layout > Use group numbers as keys 136, 176, 193
Layout > Zoom in 117, 123
Layout > Zoom out 117, 123
Least square filtering 44
Library 237
Local database 22, 261
Local database, converting to connected database 255
Log files 23
Logarithmic dependence 55
M
MANOVA 200
Match against selection only (Jackknife) 140
Matrix types 7, 33, 103
Maximal similarities (jackknife) 140
Maximum difference 226
Maximum likelihood 175
Maximum number of gaps 155, 156
Maximum parsimony 175
Maximum similarity used 145
275
Maximum value 59
Maximum value (grayscales) 39
MDS 193
Median filter 44
Metric > Assign unit 55
Metrics > Add marker 55
Metrics > Copy markers from reference system 55, 62
Metrics > Cubic spline fit 55
Metrics range of fingerprint 62
Microplate (BNIMA) 70
Minimal area 49
Minimal profiling 48, 49
Minimum consensus percentage 156
Minimum match sequence 155, 156
Minimum similarity used 145
Minimum value (grayscales) 38
Mode filter 44
Molecular sizees (defining) 55
Monotonous fit 145
Multi-dimensional scaling 193
Multiple alignment 153, 154
Multi-state coefficient 171
Multivariate analysis of variance 200
Mutation rate 178
N
Navigator 38
Nearest neighbor 139
Nearest neighbor (K-means) 139
Negative search 110, 225
Neighbor joining 132, 135, 153, 155, 160
Neighbor match 156
Netkey 17
Network 18
Network settings 18, 20
Neural network 239
New character type 65
New database (creating) 21
New fingerprint type 35
New matrix type 103
New ODBC 256
New sequence type 79, 96
Non-parametric 205
Normal priority background 133
Normalization 38, 54, 59
Normalization > Auto assign (bands) 47, 59
Normalization > Delete all assignments 47
Normalization > Show distortion bars 48
Normalization > Show normalized view 45, 46, 47, 48
Normalization > Update normalization 48
Normalized view 59
Nucleic acid sequences 79
Number of bootstrap simulations 176
Number of columns (character type) 65
Number of groups 139
Number of nodes 59
Number of rows (character type) 65
276
O
Ochiai 132
ODBC connection string 251
ODBC, import 67
One dimension 127
Open entry 29
Open gap penalty 154, 155
Optimization 122, 132, 148
Optimization, find best 148
Optimize positions (MDS) 193
Optimize topology 176
Original 36
P
Pairwise alignment 154
Parametric 205
Parsimony 175
Paste data from clipboard 74
PCA 118, 193, 194
Pearson correlation 131, 151, 171
Pheno 65
Plate (characters) 67
Plot > Use discriminant as X axis 202
Plot > Use discriminant as Y axis 202
Polymorphism analysis 121
Polynomian degree (BNIMA) 74
Port number 18
Position tolerance 122, 132, 148
Position tolerance, find best 148
Preview (band search) 49
Principal components analysis 194
Processed 36
Properties 18
Q
Quality quotient 238
Quantification > Add cells to character set 73
Quantification > Assign value 53
Quantification > Band quantification 52
Quantification > Calculate concentrations 53
Quantification > Define calibration point 73
Quantification > Export to clipboard (BNIMA) 74
Quantification > Search all surfaces 53
Quantification > Search surface of band 53
Quantification units 48
Quantification, comparative 121
Queries 110
Query sequences 234
R
Rainbow palette 39
Rank correlation 151
Raw data 41
Reference > Use as reference lane 45
Reference lane 59
Reference system 45
References > Add external reference position 45
References > Add internal reference position 48
References > Copy normalization 60
References > Paste normalization 60
References > Use all lanes as reference lanes 59
Refresh (connected databases) 251
Registry 9
Regression (congruence of techniques) 145
Regression curve 55, 61
Relative band surface 127
Relative to max. value (bands) 49
Relative usage (Netkey) 20
Relative volume 127
Removing common gaps in a multiple alignment 159
Rename (bundles) 244
Represent as List 66
Represent as Plate 66
Resolution of normalized tracks 54
Restricting queries 257
Restricting query 252
Result set 226
S
Scripts > Browse Internet 62, 67, 79, 147
Search in list 110, 225
Security driver 17
Security key 17
Select branch into list 134
Self organizing map 199
Send message (Netkey) 20
Send message to all users (Netkey) 20
Sequence > Align external branch 161
Sequence > Align internal branch 161
Sequence > Calculate global cluster analysis 160, 161
Sequence > Change saved sequence 159
Sequence > Consensus blocks 156
Sequence > Consensus difference 157
Sequence > Create consensus of branch 156, 165
Sequence > Create locked group 159
Sequence > Edit 80
Sequence > Find sequence pattern 160
Sequence > Lock / unlock dendrogram branch 159
Sequence > Multiple alignment 155
Sequence > Neighbor blocks 156
Sequence > Paste from clipboard 80
Sequence > Reload sequence from database 159
Sequence > Show global cluster analysis 161
Sequence > Unlock group 159
Index
Sequence Types 7, 107
Sequence types 79
Server computer name 18
Settings 21, 22, 23
Settings (Netkey) 19, 20
Settings > Binary conversion settings 67
Settings > Brightness & contrast 54
Settings > Comparative quantification 54
Settings > Edit reference system 55, 62
Settings > Enable fast band matching 226
Settings > Exclude active region 165
Settings > General settings 53, 66, 74
Settings > Include active region 165
Settings > New reference system (curve) 62
Settings > New reference system (positions) 61
Settings > Set as active reference system 61, 62
Settings > Statistics 141
Settings, database 22
Shoulder sensitivity 49
Shoulder sensitivity, band search 49
Show > Detailed report 239
Show > Identification comparison 239
Show bands 118, 127
Show dendrogram 135, 136, 137, 144, 161
Show matrix 144, 193
Show quantification (colors) 171
Similarity 171
Similarity calculation 154, 155
Simple Matching 151
Single linkage 132
Solving database problems 261
SOM 199
Source file location 252, 253, 256
Spot removal (2-D image) 41
SQL query 226
Standard deviation 137
Standardized characters 171
Start service (Netkey) 18
Startup program 21
Statistics (Netkey) 20
Status (Netkey) 20
Stop Service 19
Stored trees dialog box 184
Strips 38, 60
Strips > Increase number of nodes 42
Strips > Make larger 42
Strips > Make smaller 42
Subsequence (query) 110
Subsequence search 159
Subsets 115
Subtract average (PCA) 195
T
Take from experiments 171, 172
TCP/IP 17, 18
Thickness (image strips) 41
Tie handling 140
277
Tolerance 226
Tolerance & optimization statistics 148
Tone curve 42
Two dimensions (quantification) 127
U
Uncertain bands 49, 123, 132
Unit gap penalty 154, 155
UPGMA 132, 135, 140, 155, 171
Use active zones only 160
Use as default database 252
Use conversion cost 155
Use fast algorithm 155, 156
Use quantitative values (PCA) 195
Use square root 151, 171
Use the local database 250
Used range 226
V
Validation samples 241
View calibration curve (BNIMA) 74
Volume 127
W
Ward 132, 155