Bioinformatics Quiz: Test Your Knowledge of Bioinformatics

November 20, 2014
[BIOINFORMATICS QUIZ: TEST YOUR BIOINFORMATICS ]
Which of the following statements is false when describing SWISS-PROT ?

a)
b)
c)
d)
It is a curated protein sequence database

Data is redundant
Provides a high level of annotations
It is maintained by Swiss Institute of Bioinformatics and EBI .
Arrange the following in hierarchical top to bottom order as is done in SCOP:

a)
b)
c)
d)
Classes, domains, superfamilies, folds, families.

domains, superfamilies, folds, families, classes.
superfamilies, folds, families,domains, classes
Classes, folds, superfamilies,families, domains
Which of the following cases are commonly used?

a)
b)
c)
d)
gap opening penalty = -2, gap extension penalty = -0.5

gap opening penalty = -0.5, gap extension penalty = -2.0.
gap opening penalty = -100, gap extension penalty = 0
gap opening penalty = -100, gap extension penalty = -100
For searching a query sequence with a database, which of the following statement is correct?
a)
b)
c)
d)
Nucleotide query against a nucleotide sequence database is done by blastp

Protein query against a translated nucleotide sequence database is done by blastp
Translated nucleotide query against a protein database is done by blastx
Protein query against a protein database is done by tblastn
Which is the default scoring matrix used in BLAST?

a)
b)
c)
d)
PAM62
BLOSUM 62
BLOSUM 60
BLOSUM 80
PAM matrices are derived by noting evolutionary changes in protein sequences that are more
than:
a)
b)
c)
d)
80% similar
60% similar
40% similar
25% similar
Which alignment is used to predict whether two sequences are homologous or not?
a) Local
b) Global
c) Pair-wise
BY NaVeeNBioinFoRmaTiCs - any
thing about bioinformatics
Page 1
November 20, 2014
d) Multiple
In Molecular Dynamics simulation, the dependence is on:
a)
b)
c)
d)
only position
only momentum
both position and momentum
either position or momentum
In phylogenetic analysis, maximum likelihood method is chosen when the sequences have:
a)
b)
c)
d)
strong similarity
local similarity
medium level similarity
no clear identifiable similarity
The method of maximum parsimony is also known as:

a)
b)
c)
d)
maximum evolution method

minimum evolution method
zero evolution method
moderate evolution method
In Needleman Wunsch algorithm of pairwise alignment of sequences with lengths n and m, the
computational time is proportional to:
a)
b)
c)
d)
nxm
(n+1) x (m+1)
n+m
n x (m+1)
In a PHYLIP output, the first line is two numbers, what do they indicate?
a)
b)
c)
d)
Number of sequences, length of alignment

Length of alignment, number of sequences
Number of gaps, number of sequences
Number of sequences, number of gaps
BLAT is used to find:

a)
b)
c)
d)
regions of higher identity within genomic assemblies

regions of higher differences within genomic assemblies
folds in a RNA sequence
secondary structures in a given protein
Homology modeling may be distinguished from ab initio prediction because:

a) Homology modeling requires a model to be built
b) Homology modeling requires alignment of a target to a template
Page 2
November 20, 2014
c) Homology modeling is usefully applied to any protein sequence

d) The accuracy of homology modeling is independent of the percent identity between the
target and the template
Molecular Dynamics simulation is carried out for:
a)
b)
c)
d)
Obtaining ensemble of structures at physiological condition

Obtaining the structure at global energy minimum
Fitting prospective drug candidate molecules to a receptor
Modeling a protein structure from sequence alone
Threading approaches can be used to:

a)
b)
c)
d)
Predict secondary structures of proteins

Build phylogenetic trees
Identify distantly related structural homologs of proteins
To check the fitness of a modeled protein structure
What is PROSITE?
a)
b)
c)
d)
a database of protein structures

a database of interacting proteins
a database of protein motifs
a search tool
Which is the best annotated database?

a)
b)
c)
d)
Genbank
PDB
Prodom
Swissprot
If you want literature information, which is the best website to visit?

a)
b)
c)
d)
OMIM
Entrez
PubMed
PROSITE
To know the structural similarity between two proteins, the server to use is
a)
b)
c)
d)
PRODOM
PROSITE
TREMBLE
DALI
Which of the following databases is derived from mRNA information?

a) dbEST
Page 3
November 20, 2014
b) PDB
c) OMIM
d) HTGS
Which of the following amino acids is least mutable according to PAM scoring matrix?
a)
b)
c)
d)
Alanine
Glutamine
Methionine
Cysteine
You have two distantly related proteins. Which of the following sets is the best for comparing
them?
a)
b)
c)
d)
BLOSUM45 or PAM250
BLOSUM45 or PAM1
BLOSUM80 or PAM250
BLOSUM80 or PAM1
In a sequence database of a given size, which of the following expressions is likely to retrieve
more matches (X means any amino acid; any of the residues in square brackets can occupy
that position)?
a)
b)
c)
d)
D-A-V-I-D
[DE]-A-V-I-[DE]
[DE]-[AVILM]-X-E
D-A-V-E
Which alignment is used to predict whether two sequences are homologous or not?
a)
b)
c)
d)
Local
Global
Pair-wise
Multiple
In sequence analysis, Twilight zone refers to

a)
b)
c)
d)
a zone of domain in a protein sequence

a zone of sequence similarity (0-20% identity) but statistically not significant
substitutions in sequence
a zone of sequence similarity that is statistically significant
BLOCKS refers to
a)
b)
c)
d)
gapped, aligned motif in a multiple sequence alignment

ungapped, aligned motif in a multiple sequence alignment
coding sequences
non-coding sequences
Page 4
November 20, 2014
CpG islands and codon bias are tools used in eukaryotic genomics to identify open reading
frames
a)
b)
c)
d)
differentiate between eukaryotic and prokaryotic

DNA sequences
Look for DNA-binding domains
determine STS
The type of algorithm that GENSCAN tool employs is

a)
b)
c)
d)
Neural network
Rule-based system
Hidden Markovs model
Statistics based
BLASTx is used to
a)
b)
c)
d)
search a nucleotide database using a nucleotide query

search protein database using a protein query
search protein database using a translated nucleotide query
search translated nucleotide database using a protein query
Which of the following is a retrieval system?

a)
b)
c)
d)
Entrez
Bioedit
Vecscreen
Rasmol
The Smith-Waterman algorithm was developed for

a)
b)
c)
d)
Local pairwise sequence alignment

Global pairwise sequence alignment
Multiple sequence alignment
Structural alignment
In Molecular Dynamics simulation the dependence is on

a)
b)
c)
d)
position only
momentum only
both position and momentum
either position or momentum
Homology modeling involves

a) alignment of the target sequence to the sequence of a template structure
b) alignment of the target sequence with multiple sequences with no structural information
c) ab initio structure prediction
Page 5
November 20, 2014
d) no input of sequence information

Which of the following cases are commonly used in sequence alignment?
a)
b)
c)
d)
gap opening penalty = -2, gap extension penalty = -0.5

gap opening penalty = -0.5, gap extension penalty = -2
gap opening penalty = -100, gap extension penalty = 0
gap opening penalty = -100, gap extension penalty = -100
CATH database classifies protein domains. CATH stands for

a)
b)
c)
d)
Calssified, Advanced, Technology and Homology

Automatic Classification of Turns and Helices
Class, Architecture, Topology and Homologous superfamily
Classification of Alpha Trans-membrane Helices
Ab initio approaches for prediction of protein structure utilize

a)
b)
c)
d)
sequence similarity
structural similarity
both sequence and structural similarity
basic physicochemical principles
To know the structural similarity between two proteins, the server to use is
a)
b)
c)
d)
PRODOM
PROSITE
TREMBLE
DALI
Quantitative Structure Activity Relationship (QSAR) is used for

a)
b)
c)
d)
Drug design
Protein modeling
Aligning two sequences
Molecular Dynamics simulation
In protein modeling, molecular mechanics force field is used, because

a)
b)
c)
d)
it takes less time as compared to others

it is more accurate
it guarantees global minimum
it explicitly represents the electrons in a calculation
A BLAST hit with STS division of GenBank helps you to understand

a) only location of the sequence in the genome
Page 6
November 20, 2014
b) only expression of the sequence

c) both location and expression of the sequence
d) first pass survey sequences
SUMOplot is a software used to predict
a)
b)
c)
d)
succinyl modification site

serine modification site
ubiquitin attachment site
hydrophobicity graph
Which of the following plants contain the largest genome?

a)
b)
c)
d)
Arabidopsis thaliana
Fritillaria assyriaca
Zea mays
Triticum dicoccum
C in CATH database stands for

a)
b)
c)
d)
Conformation
Configuration
Classification
Conservation
The program used to convert raw sequence output to an ordered list of bases is called
a)
b)
c)
d)
Base calling
Neural network
Local area network
artificial network
Which of the following algorithms implements once a gap, always a gap policy?
a)
b)
c)
d)
ClustalW
Needleman & Wunsch
Chou & Fasman
FASTA
The sequence alignment tool for immunoglobulins, T-cell receptors, and HLA molecules
available at the ImMunoGeneTics information system (IMGT) is
a)
b)
c)
d)
IMGT/Collier-de-perles
IMGT/V-Quest
IMGT/Allele-align
IMGT/Junction Analysis
Which of the following scoring matrices of proteins is a distance matrix?
Page 7
November 20, 2014

a)
b)
c)
d)
MDM series of matrices

BLOSUM series of matrices
Conformational Similarity Weight matrix
Genetic Code Matrix
One PAM means one accepted point mutation per

a)
b)
c)
d)
102 residues
10 residues
103 residues
104 residues
Which of the following scoring matrices is one of the best to score an alignment of highly
conserved protein sequences?
a)
b)
c)
d)
BLOSUM 80 or PAM 120

Which one of the following programs is used primarily for submission of complete genomes and
batch submission of sequences to GenBank?
a)
b)
c)
d)
BankIt
Sequin
tbl2asn
WEBIN
In reconstruction of phylogenetic trees using molecular sequence data, a singleton site in MSA
is considered to be
a)
b)
c)
d)
an invariant site
an informative variable site
an uninformative variable site
a conserved site
Which of the following identifiers in GenBank changes with sequence revision/updates?

a)
b)
c)
d)
Accession
GI
Date
Both a & b
EST division of EMBL database archives data in

a) only 5 to 3 direction
b) only 3 to 5 direction
c) both 5 to 3 and 3 to 5 to represent clones from two ends
Page 8
November 20, 2014
d) either 5 to 3 or 3 to 5
Which of the following methods is used to predict the 3D structure of a protein when it has <
20% of sequence similarity with the available templates?
a)
b)
c)
d)
Homology modelling
Dynamic programming
Fold recognition
Progressive protein programming
Which of the following techniques is implemented to locate MUMs in MUMmer algorithm?

a)
b)
c)
d)
Suffix tree generation

Hash lookup table
K-tuple
Exact word match
Which one of the following techniques is used for the evaluation of phylogenetic trees?
a)
b)
c)
d)
Null hypothesis
Bootstrapping
Chi-square
Probability
NiceProt is
a)
b)
c)
d)
Protein sequence database

Derived Protein database
Protein sequence view
Nucleotide sequence view
Higher version of BLOSUM can be used to detect

a)
b)
c)
d)
Closely related sequences

Distantly related sequences
Unrelated sequences
Partially related sequences
TBLASTX matches a DNA query sequence, translated into all six reading frames, against a
DNA database with
a)
b)
c)
d)
No gaps allowed
Gaps allowed
Gaps depending on the input sequence
Gaps depending on the database
Changing which of the following BLAST parameters would tend to yield fewer search results?
a) Turning off the low complexity filter
Page 9
November 20, 2014
b) Changing the expected value from 1 to 10

c) Raising the threshold value
d) Changing the scoring matrix from PAM30 to PAM70
The Ramachandran map of a protein representation allows you to identify
a)
b)
c)
d)
The most stable structure

The tertiary allowed structure
The sterically disallowed conformations
the secondary structure elements
Which information among the following provides the maximum information to do structure based
drug design?
a)
b)
c)
d)
3D-structure of a set of active compounds

3D-structure of the target
Crystal structure of target-ligand complex
Primary structure of the target
To display a ligand molecule, one cannot use the rendering style of

a)
b)
c)
d)
Stick
Ball and stick
Ribbon
CPK/space filling
What is the difference between RefSeq and GenBank?

a)
b)
c)
d)
RefSeq includes publicly available DNA sequences

GenBank includes nonredundant curated data
GenBank sequences are derived from RefSeq
RefSeq sequences are derived from GenBank
Hemoglobin, myoglobin and globin v protein sequences will be stored in PIR-PSD database as
a
a)
b)
c)
d)
Sub-family
Superfamily
Group
GenPept
The method of maximum parsimony is also known as

a)
b)
c)
d)
Maximum evolution method

Minimum evolution method
Zero evolution method
Moderate evolution method
Page 10
November 20, 2014
The biggest problem in predicting protein coding genes from genome sequencing algorithm is
that
a)
b)
c)
d)
The software is difficult to use

The false negative rate is high; many exons are missed
The false-positive rate is high; many exons are falsely assigned
The false-positive rate is low; many exons have unknown function.
Artificial intelligence technique is used to predict secondary structure of globular protein. Which
of the following methods uses this technique to predict secondary structures of globular
proteins?
a)
b)
c)
d)
Chou and Fasman

GOR
PHD
Ab-initio
WebIn is a sequence submission tool provided by

a)
b)
c)
d)
NCBI
EMBL
EBI
RCSB
National Center for Biotechnology Information (NCBI) was established on November 4, 1988 as
a division of the
a)
b)
c)
d)
National Library of Medicine (NLM)

National Institutes of Health (NIH)
European Bioinformatics Institute
ExPASy
FASTA was the first database search program that

a)
b)
c)
d)
is much faster than Smith-Waterman

is much slower than Smith-Waterman
sensitivity and speed of the database search with FASTA are directly related
calculates similarity index
Needleman-Wunsch algorithm, is an example of dynamic programming, which does not involve

a)
b)
c)
d)
scoring a matrix
setting up a matrix
local alignment
identifying the optimal alignment
RCSB is
Page 11
November 20, 2014

a)
b)
c)
d)
An Information Portal to Protein database

An Information Portal to DNA database
An Information Portal to Biological Macromolecular Structures
An Information Portal to microarray
To identify the presence of repeats in a protein, the simplest and fastest way is to perform a
a)
b)
c)
d)
self dot-plot
dot-plot with another protein with same repeats
dot-plot with another protein with any repeat
BLAST search
Which one of the following best represents the central dogma of Bioinformatics?
a)
b)
c)
d)
Sequence-Structure-Function
DNA-RNA-Proteins
Motifs-domains-Superfamilies
Data-Databanks-Data mining tools
Multiple sequence alignments are NOT used to derive

a)
b)
c)
d)
Motifs
Primers
PSSMs
HMMs
Which one of the following matrices can be used to identify distantly related homologs?
a)
b)
c)
d)
BLOSUM90
BLOSUM62
BLOSUM45
BLOSUM80
The LIS technique is used in the MUMmer algorithm for

a)
b)
c)
d)
Identification of MUMs
Sorting of MUMs
Alignment of MUMs
Tabulating MUMs
The numbers at the internal nodes of a phylogenetic tree indicate

a)
b)
c)
d)
Number of times the OTUs were clustered together

Number of parsimony sites shared by OTUs
Number of mismatches shared by OTUs
Similarity score of OTUs that cluster together
Which one of the following statements is FALSE?
Page 12
November 20, 2014
a) Needleman & Wunsch algorithm is used for global alignment of pair of sequences.
b) There could be several possible local alignments as part of a global alignment.
c) In Needleman & Wunsch algorithm sequences are randomised by keeping length and
composition same.
d) The terms identity, similarity and homology are expressed as %.
Maximum parsimony analysis in the context of molecular phylogeny implies
a)
b)
c)
d)
Complex hypotheses are preferred over simpler hypotheses

Complex and simple hypothesis need not be considered
Simpler hypotheses are preferred over complex hypotheses
Both complex and simple hypotheses are considered, and the one, which is more
suitable to observations is applied
Molecular dynamics differs from molecular mechanics by taking account of the

a)
b)
c)
d)
velocities of the constituent particles

effect of the solvent medium
non-bonded interactions
periodic boundary condition
The double-helical structure of DNA was first obtained using

a)
b)
c)
d)
Fiber diffraction only

Fiber diffraction and molecular modeling
X-ray diffraction from single crystals
Diffraction from single crystals and molecular modeling
In protein sequence analysis, Twilight zone refers to the evolutionary distance corresponding to
about
a)
b)
c)
d)
60% identity between two proteins

In a pairwise alignment, an optimal alignment is the one that

a) either minimizes the implied number of evolutionary changes or minimizes a particular
scoring function.
b) either maximizes the implied number of evolutionary changes or minimizes a particular
scoring function.
c) either minimizes the implied number of evolutionary changes or maximizes a
particular scoring function.
d) either maximizes the implied number of evolutionary changes or maximizes a particular
scoring function.
Page 13
November 20, 2014
Which one of the following proteins can be used as a template for structure prediction by
homology modelling?
a)
b)
c)
d)
pdb|1TLH|B: Identities = 39/66 (59%), Positives = 51/66 (77%), Expect = 3e-16

pdb|1DQL|H: Identities = 9/15 (60%), Positives = 12/15 (80%), Expect = 9.9
pdb|1L9U|H:Identities = 173/333(51%), Positives = 233/333(69%), Expect = 2e-89
pdb|1RP3|A: Identities = 56/206 (27%), Positives = 98/206 (47%), Expect = 2e-05
How many edges meet at every branch node in a phylogenetic tree?

a)
b)
c)
d)
1
2
3
4
Which of the following descriptors would be a suitable set for QSAR analysis?
a) logP, molecular volume, Hammet and constants, molar refractivity, polar
surface area
b) logP, number of synthetic steps, polar surface area, molar refractivity
c) logP, number of nitrogen atoms, Hammet and constants, molar refractivity, polar
surface area
d) molecular weight, molecular volume, molecular surface area.
PAM120, PAM80 and PAM60 scoring matrices are most suitable for aligning sequences with
a)
b)
c)
d)
40%, 50% and 60% similarity respectively

The usefulness of PAM matrices have no relationship with similarities of sequences to
be aligned
A protein has three domains P, Q, and R, whereas another protein has three domains R, S and
Q in that order. The preferred alignment algorithm for these two proteins will be
a)
b)
c)
d)
Local alignment
Global alignment
Both algorithms will give the same results
None of the methods are suitable in this case
When p and q are lengths of sequences, the computational complexity of the Needleman and
Wunsch algorithm is
a)
b)
c)
d)
O(pq)
O(p+q)
O (q log p)
O (pq)
Page 14
November 20, 2014
You are interested in a particular enzyme that is expressed in various human tissues. You have
isolated the protein from the brain, liver and kidneys. After a lot of experimentation you
determine that the liver protein has three domains A, B and C occurring in sequential order.
Domain B is the catalytic domain and the other two have regulatory function. The kidney protein
has only domains A and B in that order and the brain protein has domains B and C. You then
proceed to determine the primary structure of the proteins using chemical methods and find that
the amino acid sequence of the three domains are completely identical regardless of the source
from which they were isolated. You then ask the question whether the three different proteins
have all originated from the same gene by means of alternative splicing, or they could be
products of different genes. Having the experimentally determined protein sequences and
knowing the sequence of the human genome, which one of the following bioinformatic method
you will use to answer the question above.
a) TBLASTN using the protein sequence as query and the human genome sequence
as database.
b) TBLASTX using the protein sequence as query and the human genome sequence as
database.
c) BLASTN using the protein sequence as query and the human genome sequence as
reference.
d) BLASTP using the protein sequence as query and the human genome sequence as
reference.
Which of the following terms will have to be taken into consideration for developing a potential
function for docking simulation?
a)
b)
c)
d)
hydrogen bonding, van der Waal's and electrostatic interaction terms

Bond, angle and dihedral terms
Dihedral and hydrogen bonding terms
Bond, angle and hydrogen bonding terms
References
DBT-JRF Question papers and Answer Keys
Page 15
November 20, 2014
Page 16

Bioinformatics Quiz: Test Your Knowledge of Bioinformatics

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Bioinformatics Quiz: Test Your Knowledge of Bioinformatics

Transféré par

Droits d'auteur :

Formats disponibles

November 20, 2014

[BIOINFORMATICS QUIZ: TEST YOUR BIOINFORMATICS ]

Which of the following statements is false when describing SWISS-PROT ?

It is a curated protein sequence database

Arrange the following in hierarchical top to bottom order as is done in SCOP:

Classes, domains, superfamilies, folds, families.

Which of the following cases are commonly used?

gap opening penalty = -2, gap extension penalty = -0.5

Nucleotide query against a nucleotide sequence database is done by blastp

Which is the default scoring matrix used in BLAST?

thing about bioinformatics

November 20, 2014

[BIOINFORMATICS QUIZ: TEST YOUR BIOINFORMATICS ]

The method of maximum parsimony is also known as:

maximum evolution method

Number of sequences, length of alignment

BLAT is used to find:

regions of higher identity within genomic assemblies

Homology modeling may be distinguished from ab initio prediction because:

thing about bioinformatics

November 20, 2014

[BIOINFORMATICS QUIZ: TEST YOUR BIOINFORMATICS ]

c) Homology modeling is usefully applied to any protein sequence

Obtaining ensemble of structures at physiological condition

Threading approaches can be used to:

Predict secondary structures of proteins

a database of protein structures

Which is the best annotated database?

If you want literature information, which is the best website to visit?

Which of the following databases is derived from mRNA information?

thing about bioinformatics

November 20, 2014

[BIOINFORMATICS QUIZ: TEST YOUR BIOINFORMATICS ]

In sequence analysis, Twilight zone refers to

a zone of domain in a protein sequence

gapped, aligned motif in a multiple sequence alignment

thing about bioinformatics

November 20, 2014

[BIOINFORMATICS QUIZ: TEST YOUR BIOINFORMATICS ]

differentiate between eukaryotic and prokaryotic

The type of algorithm that GENSCAN tool employs is

search a nucleotide database using a nucleotide query

Which of the following is a retrieval system?

The Smith-Waterman algorithm was developed for

Local pairwise sequence alignment

In Molecular Dynamics simulation the dependence is on

Homology modeling involves

thing about bioinformatics

November 20, 2014

[BIOINFORMATICS QUIZ: TEST YOUR BIOINFORMATICS ]

d) no input of sequence information

gap opening penalty = -2, gap extension penalty = -0.5

CATH database classifies protein domains. CATH stands for

Calssified, Advanced, Technology and Homology

Ab initio approaches for prediction of protein structure utilize

Quantitative Structure Activity Relationship (QSAR) is used for

In protein modeling, molecular mechanics force field is used, because

it takes less time as compared to others

A BLAST hit with STS division of GenBank helps you to understand

thing about bioinformatics

November 20, 2014

[BIOINFORMATICS QUIZ: TEST YOUR BIOINFORMATICS ]

b) only expression of the sequence

succinyl modification site