Vous êtes sur la page 1sur 4

CH/BI 421/621/527 F16

Bioinformatics Worksheet for LDH

Bioinformatics Exercises:
Bovine Lactate Dehydrogenase (LDH)
BACKGROUND:
Often primary structure (amino acid sequence) is the first piece of experimental information a
biochemist wants to have about a protein s/he is interested in studying since it can be used to make several
predictions about the properties and possible behavior of the protein such as:
Protein molecular weight by adding up the masses of the individual amino acid residues.
Isoelectric point. The isoelectric point is where the protein has no charge. Because of ionizable
functional groups on amino acids, protein charge changes as a function of pH depending on whether
or not these groups are protonated. By knowing the sequence, we know how many of each ionizable
group our protein contains. If we know the pH range where these groups become protonated or
deprotonated, we can estimate the charge of the whole protein as a function of pH. This will be
discussed in more detail below.
Molar extinction coefficient. Tryptophan, Tyrosine and Cysteine residues absorb ultraviolet light at 280
nm. By knowing how many of these amino acids are found in our proteins sequence, we can calculate
how much we expect a solution of our protein to absorb 280 nm light as a function of its concentration.
I say expect instead of determine because the amount of light absorbed by these amino acids is
dependent on their local environment within the protein especially on whether they are on the surface
and exposed to the solution or buried inside the protein.
Sequence similarity to other proteins which suggest homology to proteins of known function and/or
structure.
Other structural predictions based on sequence
o Disulfide bonds. If your protein is from cytoplasm it will likely not have disulfide bonds in its native
conformation because the intracellular environment is reducing. However, if it is an extracellular
protein, disulfide bonds play a critical role in protein stability.
o Secondary structure. Based largely on databases of experimentally-determined protein three
dimensional structures, some sequences and particular amino acid residues are more or less likely to
form particular types of protein secondary structure hydrogen-bonding networks.
o Stability with respect to proteolytic digestion. All cells contain proteases. When cells or tissues are
disrupted to isolate proteins, these previously compartmentalized protein-cutting enzymes are now in
the solution with your protein target. Proteases have different specificities in terms of protein sequence
and some sequences are particularly yummy (likely to be cleaved by these enzymes).
o Hydrophobicity. Once the sequence is known, you can look for the location of all of the hydrophobic
amino acid side chains. If you find a long (~23 residues) linear stretch of sequence containing only
hydrophobic amino acids, this may suggest a region of the protein that spans a lipid membrane.
o Potential post-translational modification sites for glycosylation, biotinylation, binding metal
cofactors, etc.
All of this is very useful information, but much of it is a prediction and may not be true of the
biologically-relevant folded protein (the native structure). Protein sequence cannot yet predict tertiary
structure or association with other subunits (quaternary structure). Even the secondary structure prediction
tools are often inaccurate. Sequence does not tell you about the overall shape of the protein or the
characteristics of its surface such as its charge distribution or whether or not it has hydrophobic patches.
These surface characteristics are important both for the biological function of the protein and for determining
how other molecules may interact with it and currently still need to be determined experimentally.

CH/BI 421/621/527 F16


Bioinformatics Worksheet for LDH
In the past four weeks you have purified LDH from bovine heart or skeletal tissue and soon you will
be conducting biochemical and biophysical experiments to characterize it both structurally and functionally.
This bioinformatics worksheet coupled with the molecular visualization worksheet you will complete during
week 7 in the lab, you will help you to take advantage of what can be learned about the proteins using
bioinformatics tools and structural information from proteins whose structure has been experimentally
determined and deposited in the protein databank (PDB).
The ability to store and interconnect all available information on proteins is crucial to modern
biological research. The Universal Protein Resource (UniProt) provides a stable, comprehensive, freely
accessible central resource on protein sequences and functional annotation. UniProt is produced by the
UniProt Consortium, formed in 2002 by the European Bioinformatics Institute (EBI), the Protein Information
Resource (PIR) and the Swiss Institute of Bioinformatics (SIB).
The following worksheet is designed to familiarize you with some of the basic types of bioinformatics
questions you can answer using the UniProt website (http://www.uniprot.org) and other web tools linked to
this site.
Exercise 1:
In this exercise:
a. The amino acid sequences of different LDH polypeptides from bovine (corresponding to muscle
and heart) will be retrieved.
b. These retrieved bovine sequences will be used to retrieve the corresponding sequences from
humans.
c. The four retrieved sequences will be aligned using tools from UniProt.
Sequence alignments can provide valuable insights into the evolution of a particular protein. Protein
sequences are typically aligned by comparing amino acid identities, amino acid types, amino acid
similarities, and protein structural motifs or domains. A set of symbols are typically used to identify identical
and similar amino acids in aligned sequences; the UniProt tools use the *, :, and . symbols.
The Basic Local Alignment Search Tool (BLAST) can search across a very large database for proteins
containing identical or similar sequences. In this way, an unknown sequence can be attributed to a known
protein, or sequence similarities between known proteins can be assessed. To indicate the statistical
significance of these matches, BLAST searches provide the parameters of Identity and E-value. The Identity
parameter indicates the percent similarity between the sequences (100% Identity indicates that 100% of
the amino acids are identical). The E-value shows how many matches (or hits) could be expected by
searching the database; an E-value of 1 indicates that the search yields 1 match by chance. Consequently,
sequences matches that are significant (and are not due just to chance) typically have E-values that
approach zero.
Finding the amino acid sequence of LDH from Organism: bovine (Bos Taurus)
i. Go to http://www.uniprot.org.
ii. In the search field, enter bovine LDH and click on the search button. The following page lists all the UniProt
entries that relate to bovine LDH.
iii. Choose the entry with Protein Name: L-lactate dehydrogenase A chain
iv. Record the Protein Accession (AC) # listed under the first column Entry
v. Once clicked/checked, scroll to the top of the page and click on Add to basket.
vi. Scroll down and Choose the entry with Protein Name: L-lactate dehydrogenase B chain
vii. Record the Protein Accession (AC) # listed under the first column Entry
viii. Once clicked/checked, scroll to the top of the page and click on Add to basket.
ix.
Now go up to the top right corner of the page and click on Basket.
x.
This opens a small window that lists all the entries in the Basket.
xi.
Select/check the first entry P19858.
xii.
Click the box that says BLAST and use the default setting to run BLAST

CH/BI 421/621/527 F16


Bioinformatics Worksheet for LDH
xiii.
xiv.
xv.
xvi.
xvii.
xviii.
xix.
xx.
xxi.
xxii.

Once complete, the results page lists the UniProt entries that contain similar sequences; these
results are listed under the headings Overview and Alignments.
Scroll down the Overview listings until you see L-lactate dehydrogenase A chain (Homo
sapiens. Note down its accession number P00338
Find the same sequence under the alignment listing and select/click this entry. Then add it
to the basket.
Go back to your basket, select bovine L-lactate dehydrogenase B and repeat xii-xv.
The accession number for L-lactate dehydrogenase B chain (Homo sapiens) is P07195
Go back to the basket, check all the four entries and click on the Align button at the
bottom left of the window.
After a few moments, the alignment procedure is complete and a page displaying the
arrangement of sequences from all of the entries appears.
To ensure that all the necessary information is displayed, click/check on the Tree and Result
Info options on the top left side of the page, under the heading Display.
As appropriate, use the Annotation tools on the left side of the page (under the Highlight
heading) to selectively highlight amino acids with specific properties (e.g., metal binding,
aromatic, etc.).
Use the combined information on this page to answer the questions below.

1. On the protein alignment, what do the asterisk (*) symbols represent?


A. The general type of amino acid is conserved across the compared proteins.
B. The same amino acid is conserved across the compared proteins.
C. The general property of the amino acid is conserved across the compared proteins.
D. There are no conserved amino acid properties among the compared proteins.
2. On the protein alignment, what do the colon (:) symbols represent?
A. The general type of amino acid is conserved across the compared proteins.
B. The same amino acid is conserved across the compared proteins.
C. The general property of the amino acid is conserved across the compared proteins.
D. There are no conserved amino acid properties among the compared proteins.
3. On the protein alignment, what do the period (.) symbols represent?
A. The general type of amino acid is conserved across the compared proteins.
B. The same amino acid is conserved across the compared proteins.
C. The general property of the amino acid is conserved across the compared proteins.
D. There are no conserved amino acid properties among the compared proteins.
4. By using the Highlight tool study the similarities and differences between muscle and heart sequences
as well as between bovine and human sequences. (Note that this comparison will be very useful in
interpreting CH5 experimental results)
Exercise 2:
In this exercise:
You will look into some of the physicochemical properties of the bovine LDHA using the protein parameters
(protpar) tool.
i.

Go back to the basket and click on the accession number P19858 to access information about this protein. Take
some time to familiarize yourself with the kinds of information this file contains.
ii. Scroll down to the Sequence section of the file and click the blue button labeled FASTA to download the protein
sequence in the FASTA format. In this format the first line starts with the character > followed by some
informational text, indicating that that line is for informational content only and will be ignored by other programs
running their own algorithms. This line is followed by the single letter amino acid sequence of the protein.
iii.
Copy/paste the sequence here:
iv.
To the right of the Sequence box, there is a scroll down menu (showing BLAST as default).
v.
Click on the arrow to activate the menu and select ProtParam and click GO
vi.
Click submit at the bottom of the page that opens.

CH/BI 421/621/527 F16


Bioinformatics Worksheet for LDH

ProtParam program uses the primary structure of your protein to determine:


a. Molecular weight
b. Isoelectric point
c. Amino acid composition including sums of residues that will always be negatively charged
in a physiologically-relevant pH range (Aspartate and Glutamate) and those that will always
be positively charged (Arginine and Lysine).
d. Atomic composition
-1
-1
e. Molar extinction coefficient (M cm ) at 280 nm
f. Half life and instability index (predicted susceptibility to proteolysis)
g. Aliphatic index and hydropathicity (based on relative amounts of polar and nonpolar
residues)
ii. Using the Results of the ProtParam, fill out the following information. For the extinction
coefficient, use the oxidized form of the proteins (disulfide bonded cystines).
iii.
Protein ID and #
of MW
pI
# neg
# pos
Aromatic amino acids Extinction
Accession #
amino
g/mol
(asp
(arg
Coefficient
-1
-1
tyr
phe
trp
M cm
acids
and
and
glu)
lys)

Exercise 3:
In this exercise:

You will use a program called Jpred to predict the secondary structures in bovine LDHA. This exercise will
help you see how the primary structure of a protein can be used to PREDICT secondary structures within
a protein and will allow you to compare predictions to actual structural findings of experimental data.
i.
ii.
iii.

iv.

Go to http://www.compbio.dundee.ac.uk/jpred/
Paste the LDH sequence (Exercise 2, part: iii) into the appropriate field and click Make a Prediction
It will let you know if theres an experimental crystal structure for your protein that gives far more
accurate structural information than the prediction tool, but for demonstration purposes go ahead and
click on continue to generate the predicted secondary structures based on primary structure. It will
take a few minutes for the computer to do the computation.
th
Use the displayed results (use the 4 line jnetpred), fill out the table below that lists each stretch
of beta strand (E- green arrow) or alpha helix (H-red cylinder) for the first 52 residues of bovine
LDHA.
Residue Range

Sequence

Secondary Structure