Using the RAST prokaryotic genome annotation server

RAST is designed to rapidly call and annotate the genes of a complete or essentially complete prokaryotic genome.
RAST, Rapid Annotations based on Subsystem Technology, uses a "Highest Confidence First" assignment
propagation strategy based on manually curated subsystems and subsystem-based protein families that
automatically guarantees a high degree of assignment consistency. RAST returns an analysis of the genes and
subsystems in your genome, as supported by comparative and other forms of evidence.
Because NMPDR and the SEED provide access to all essentially complete, public genomes without a user account,
the use of RAST without an account makes no sense—you must have a free account in order for access to your data
to be kept under your control. The tools available in RAST for comparing your new private data to public genomes
are mostly the same as those available for analyzing public genomes at NMPDR (www.nmpdr.org).
The tour of the site will follow the workflow listed below. For short answers to specific questions, see the RAST
FAQ.

Upload and manage your job
Sequence format and upload steps
Log in and select "Upload New Job" from the "Your Jobs" menu.

Step 1: Browse for the sequence file which must be a plain text file in either FASTA format or GenBank
format only.
o

Multiple contigs or replicons of the same genome should all be together in one file.

o

Files encoded as html, pdf, rtf, doc, docx, embl, gff3, or gtf will be rejected. Sequences in the
correct FASTA or Genbank format must NOT be in a Microsoft Word document--save as a plain
text (*.txt) file, text encoding Windows (default); do NOT insert line breaks or allow character
sustitution.

o

Click the button to "Upload and go to step 2."

Step 2: Provide the name of the organism and choose a translation table.
o

If you know or find the taxonomy ID from NCBI, paste it into the text box. Then, when you
select either Bacteria or Archaea with the radio button, the corresponding genus, species, and
strain will autofill accurately. If you do not know or cannot find an ID in the NCBI Taxonomy
database, then fill in the genus (one word), species (one word), and strain (any number of

words). you may elect to preserve gene calls. . • Step 3: Provide information about the sequence data and select settings. o Most bacteria use version 11 of the genetic code. o Click the button to "Finish Upload. • Track progress: All of your jobs are displayed in a table with active headers. Your annotation job could be complete in as few as 8 hours." Manage your job From the "Your Jobs" menu. o Look at the information in the "Upload summary" tab to confirm that the system detected the sequence data you intended to upload. o An overview of the progress is shown in the table as a series of colored boxes. o Select whether the translated proteins should reflect corrected frame shifts if you have low-quality sequence data. this choice would be unavailable. select "Jobs Overview. Select the link to view details of one job. but mycoplasmas and spiroplasmas use version 4. • Access completed job: From the details page. o How to navigate the genome viewer will be discussed below. Since there are no gene calls in a FASTA file. RAST will provide a dummy ID number corresponding to nothing in the taxonomy database." If you have logged out. you may view or download the annotated genome. you will be directed to your jobs overview upon logging back in. o If you uploaded a GenBank file.

how many genes." An intermediate screen will appear to confirm whether you are sure you want to delete. o If you would like to share with many people. Share your annotated genome with one or more other users.gov. click on "view details" and then "Browse annotated genome in SEED Viewer" • The overview page opens with a table that lists how many contigs. . with and without EC numbers.g. request a new group by emailing rast @ mcs. the green menu bar in the header of the page will provide an option labeled with your job number.o • Download formats include GenBank. Then. and the subsystem categories represented in your genome. Group memberships may be viewed from the account management page. the number of genes that are assigned to complete subsystems. o You can share this job with others by clicking the link and adding the email addresses (one at a time) of registered users to whom you would like to grant access to your otherwise private data. if needed. Click the button to do so. GFF3.anl. • Delete job: First you must click on the "view details" link in the jobs table. View your annotated genome results Organism Overview page • From the jobs table. This is also where you can change your password. e. which is accessed by clicking on the pair of people at the far right of the green menu bar . and EMBL. a class. The only action to choose from this menu is "Delete this job.

. or leave at similarity for isolated genes. • To expand the graphic comparison to more genomes.• The green "Features in Subsystems" tab displays all genes and other features that are automatically included in subsystems because one similar sequence was found for all roles in a functional variant of the subsystem. type in a larger number of genomes (20). • In the menu bar. The table is resortable and downloadable. • All information in the graphic is presented in a table in the next tab. click the advanced button. Click on any feature ID to open the Annotation Overview page for a selected feature. then click the button to redraw the graphic. Annotation Overview pages for individual features • Compare Regions displays the new genome at the top in comparison with 4 other closely related genomes. under Organism. select the option to collapse close genomes. select PCH pin for clustered genes. the feature table will display all annotated features in your genome— both those in and not in complete subsystems. • Sets of homologous proteins located in the genomic region are presented in the same color with a numerical label.

Clicking a feature arrow in the graphic will center the graphic on that protein and color the focus protein red. You may choose a larger window. • The genome browser provides a visual tour of the annotated features. click to open the Glycolysis and Gluconeogenesis subsystem. along with the number or proteins assigned to each. and you may color the features by subsystem or clustering. you can select the "Carbohydrates" category from the column header. under Organism -> Genome Browser. . Click on the details button in the focus tab to open the Annotation Overview page for the protein of focus. • The table beneath the graphic allows you to scan or search for a feature of interest. Click navigation arrows to move forward or back along the genome. Mouse over any feature to pop up its identity. • The table (click the green "Features in Subsystems" tab) provides similar access. Click the "Show" button in the Region column to focus the browser graphic on the selected feature. • Click on any feature ID in the table to open its Annotation Overview page. Metabolic reconstruction of your new genome--automated subsystems assignments • The genome overview page displays a pie chart of complete subsystems identified in the genome. • Expand the categories to see subcategories and subsystem names.Walk the chromosome or contigs of your new genome • Browse genome Access to the genome browser is available from the menu bar. and in the hint box at the top of the Organism Overview page. • From the Carbohydrates category either in the chart or table.

genomes in rows. .• In the subsystem spreadsheet. The spreadsheet is arranged with functional roles in columns. the new genome is highlighted and displayed in the context of closely related public genomes. which will open in a new window or tab. Within one row. and genes annotated to those roles in the respective cells. • Click on any of the genes in the newly annotated genome to open its Annotation Overview page. genes that are clustered on the genome are shown in the same color.

high-quality genome. • Step 2: Select up to three comparison genomes. . go back on your browser to return to the overview. • Step 1: Select a reference genome. If your private sequence is in many contigs. then select "Sequencebased comparison" from the Comparative Tools menu. the best selection may be a known. • Step 3: Click button to compute.How does your genome sequence compare with others? Run the sequence-based comparison tool From the page showing the subsystem.

If your private genomes are in multiple contigs. public genome as the reference. the GenBank and FASTA versions of the same Streptococcus equi subsp. in order of the contigs/genes in the reference genome. Notice that the most similar sequences are ribosomal proteins. but logarithmic. and you use a closed. and length. The second two genomes are nephritogenic strains of Streptococcus pyogenes from the list of public genomes. zooepidemicus MGCS10565 genome in order to compare the preserved. Protein sequence similarities are computed on demand. • The example illustrated below uses two private jobs. these results will help order your contigs. which allows you to select from all your annotation jobs. published gene calls to the RAST-computed gene calls. • Results are computed on demand in real time using BLASTP to compare every protein in the reference genome to every protein in the comparison genomes.Result • This tool allows you to select from all your private genomes as well as all public genomes. The gene numbers are linked to pop-up boxes that list the annotation (name) as well as the proportion of identical amino acids. • Results are presented in a color-coded table and in a circular map. . gene number. and that follows the order of the visible spectrum. • Comparison proteins are listed with their contig number. • The amino acid identity of the comparison genomes relative to the reference is color-coded on a scale that is not linear.

this comparison of which functions are annotated in B but not automatically associated with subsystems in A . • Step 2: Highlight one comparison genome in the list. searchable by subsystem category or name. but not B. and downloadable. in both A and B." • Step 1: Your newly annotated genome is already input as the reference. or in B but not A. select "Function-based comparison. When genome B is very closely related to the new genome. A. • Step 3: Click the "Select" button Result • The table opens with all features in your genome (A) or the comparison genome (B) that are associated with a complete subsystem. • The table is sortable. • The first column may be reset to show features associated with subsystems in genome A.How does your genome content compare with others in the database? Run the function-based comparison tool From the Comparative Tools menu of the organism summary page.

if the best sequence match to your new protein has a functional annotation that does not match exactly to the name of a role in a subsystem. In order for two or more of your jobs to be displayed in compare regions or other comparative tools. the similarities between two or more private genomes are calculated only upon request. the similarities of features in the private genomes to each other must be calculated. In order to maintain privacy of the data in each job. How to include two or more jobs (private organisms) in comparative analyses Set private organism preferences For each individual job. while inclusion in subsystems is based on the annotation matching that of a functional role of a subsystem. It is important to keep in mind that automated subsystem assignments are made only if all roles required for one functional variant of a subsystem have been correctly annotated. Such an event may occur when the protein with the best sequence match in the database has not yet been reviewed and anotated by the curator of the subsystem in question. RAST calculates similarities between all features in the input genome (private genome) and all features the SEED database (public genomes). • Select "Private Organism Preferences" from the Your Jobs menu • Shift the genomes you want to compare with each other into the peers box at right by selecting them (one at a time) and clicking the right-pointing arrow. • Select those comparisons that require calculation. the protein in your new genome will not be included in the subsystem. then click the button to request computation. • You will recieve an email when the computation is complete. • Click the button to check requirements for computation. Annotations are assigned based on sequence similarity.indicates a place to begin looking at the annotations to evaluate accuracy or find missing functions. . Therefore.