Vous êtes sur la page 1sur 5

Bioinformatics

Project: Exploring PKU (Bio 2960 Spring 2011) Pre-assignment: Decide if you want to work individually or in a group. Groups can be up to four students. If you want to work in a group, one member of the group needs to email the names of group members to PKU@biology2.wustl.edu by Wednesday, February 2, 2011, 11:59 PM. If you think its possible a name is not unique in the class, include a student ID number. To avoid confusion, please use names as they appear on the course roster. If I don't have your name on a group list, I will assume you want to work individually. Note this IS NOT Dr. Hafer's regular email address! Please email bioinformatics assignments only to the PKU@biology2.wustl.edu General Note: You will need to save some files from one assignment to the next. If you are working in a group, you should designate a group member to be responsible for the files. As with everything, once you generate a file you need, be sure to back it up so you don't end up repeating work unnecessarily. Summary of assignment due dates: Assignment 1: February 6, 2011 Assignment 2: February 20, 2011 Assignment 3: March 6, 2011 Assignment 4: March 27, 2011 Assignment 1: The goal of assignment 1 is to familiarize you with the disease Phenylketonuria (PKU), and the mutations that cause it. Read about the human genetic disease Phenylketonuria (PKU) on Wikipedia, and answer the assignment 1 questions. Assignment 1 questions are due Sunday, February 6, 2011 by 11:59 PM. Email your answers to PKU@biology2.wustl.edu. In the subject line, type the last name of each person in your group. At the top of the body of your email, include the full name of each group member. You can type the answers directly into the email message, or type answers to a document and attach it to your email. No need to retype the questions. NOTE: The due date is Super Bowl Sunday! Plan accordingly. Assignment 2: Compare the sequence of a normal PAH gene to the sequence of a mutant PAH gene (that leads to PKU). The goals of assignment 2 are 1) to familiarize you with the National Center for Bioinformatics Information (NCBI) web site and the information available there, and 2) to show you how to easily compare multiple sequences using the program ClustalW. General Note: Using the ClustalW program is not difficult, but formatting is a bit fussy. See the troubleshooting guide posted on telesis if you are having trouble getting ClustalW to run, or having trouble understanding the alignment file.

Use a google search to find the NCBI homepage. NCBI is a site maintained by the National Institutes of Health, an agency of the US government. At the NCBI site, do a search for PAH. From the database list (top right) choose "gene" and in the search box type "PAH human". Click on the first match (geneID 5053). This takes you to the Entrez Gene entry for the human PAH gene. Scroll down the page to see how easy it is to find lots of information from this page, including gene map, chromosome map, links to published research papers, etc. Back at the top of the page, click on "reference sequences" in the Table of Contents on the right (note this just takes you to a specific part of the Entrez Gene page you have just explored). Reference sequences are sequences that have been determined to be the most reliable version of some, gene, mRNA, or protein that has been sequenced (often multiple times by different scientists). There are three reference sequences available here: genomic (=gene=DNA), mRNA, and protein. In the Genomic category, click on "GenBank". This takes you to the GenBank entry for human PAH. GenBank is a giant database of known sequences. When researchers determine the sequence of something, they can submit that sequence to GenBank. Through GenBank, the sequence is freely available to everybody else in the world! At the GenBank entry, note the length (in base pairs [bp] or kilobase pairs [kb]) of the entry (gene). There is a lot of information here that can be hard to understand, but scroll down a bit and you will come to the actual gene sequence. Lots of bases... Hit the back button to go back to the list of reference sequences. Under mRNA and Proteins, click on the link to the mRNA sequence (NM_000277.1). This takes you to a GenBank entry for the messenger RNA. Note the length of the mRNA. To do our next step, we need the mRNA sequence in a particular format called the FASTA format. Back at the top of the GenBank entry, click on the word FASTA (just below the tabs on the left). This gives you the same gene sequence in a different format. Copy the sequence (starting with and including the >gi...) and paste it in a word document. IMPORTANT NOTE: once you have copied the sequence, you need to introduce a return (enter) after the line that starts >gi and before the first base of sequence (after mRNA before the C). In Word you can press the show paragraph button () to verify that you have a in the right place. Go back to the list of reference sequences, and click on the link to the protein sequence (NP_000268.1). Note the number of amino acids (aa). Click on FASTA, copy the sequence, and paste it into a new word document. You will not use this document again until assignment 3.

Find the document called mutant PAH mRNA sequence on telesis. Copy that sequence (already in FASTA format) and paste in into the word document that already has the PAH reference sequence mRNA that you copied (there should be at least one blank line in between the two sequences). This sequence should include the needed return, but you can double check. Copy both sequences together (including the >gi... part of each). Google "ClustalW" to find the ClustalW website. Once there, paste the two copied mRNA sequences into the query box. You don't need to change or enter anything else. Click run and wait for your results. Once the results appear, scroll down to the alignment section and click on "view alignment file". Note that the two sequences are aligned along their length. A "*" under the sequences indicates identical bases. Find the one base that is not identical and note the change (from what base to what base) and the position number. Save your ClustalW alignment to turn in as an email attachment. You can copy and paste, or save the alignment file from the link at the top of the ClustalW output page. Don't worry about the formatting, which can look pretty ugly if you cut and paste. Assignment 2 questions and your alignment file are due Feb 20, 2011 (by 11:59 PM). Email to PKU@biology2.wustl.edu. Questions can be typed in the email or sent as an attachment; the alignment file should be attached. Type last names in the subject line, and full names at the top of the body of the email. Assignment 3: Translate the mutant and normal mRNA sequences into protein sequences, and do an alignment to determine the mutation site. Web tools have been developed to translate mRNA sequences into protein sequences. In a cell, there are signals for the translation machinery to tell it where to start translating. Using computer translation tools, those signals may be missing. Open your word file that contains the mRNA sequence from the normal and mutant PAH genes. Go to the ExPasy translate web site (do a google search). Copy one of the PAH mRNA sequences and paste it in the query box. Click on "translate sequence". Note that you get back six different results, from six different frames. Think about why there are six different results. Choose the frame that you think is correct. Remember that an open reading frame goes from a start to a stop, with a reasonably large number of amino acids in between. Translation, especially in eukaryotes, almost always starts with the amino acid methionine (Met, M), but methionine can also occur in other places in a protein. For the frame you think is correct, click on the link at the top of the translation. This opens a new window with the translated sequence. Click on the actual amino acid that you think would be the first amino acid in the protein. This opens a new window showing the translated product. Copy the sequence and paste it in the

word document with the reference sequence protein you copied for assignment 2. Repeat the translation with the other mRNA sequence. At the end your word document should contain three protein sequences: 1) the reference protein sequence for PAH from GenBank, 2) your translated sequence from the reference sequence mRNA (normal) and 3) your translated sequence from the mutant PAH mRNA. Make sure all of these are in the correct format for ClustlW, with a >XXXX in the first line, followed by a return before the first amino acid. Copy your three protein sequences. Open the ClustlW website, paste the three sequences in the query box, and click run. View the alignment file, and save some version of the file to turn in (as an email attachment). Which sequences are the same, and which is different? Note what aa is changed to what other aa, and at which position (number). Write this in two formats, one using the single letter aa code, and the other using the three letter aa code. Assignment 3 questions and your alignment file are due March 6, 2011 (11:59 PM). Email (using the same rules explained previously) to PKU@biology2.wustl.edu Assignment 4: Exploring the mechanism of our PAH mutation. Go to the NCBI homepage, and under "popular resources" (top right), choose PubMed. This is a database of publications relating to biology and medicine. At the PubMed page, type into the search box: erlandsen pediatrics 2003 This should retrieve a paper from the journal Pediatrics, published in 2003 by author Heidi Erlandsen, et. al. Obviously without these specifics search parameters you would have to work a little harder to find this specific paper, but you could do it. At the top right, click to get the free full version. Answer the assignment 4 questions while concentrating on these parts of the paper: Abstract Figure 1 Figure 2 Figure 4 Under "STRUCTURAL BASIS FOR HPA AND PKU", read PAH Active Site Mutations If reading primary scientific literature is new to you, this may be difficult. It's okay if you don't understand everything in this paper. Try to get the main ideas, and the information you need to answer the questions. Assignment 4 is due March 27, 2011 (by 11:59 PM). Submit, as previously discussed, to PKU@biology2.wustl.edu

Vous aimerez peut-être aussi