Vous êtes sur la page 1sur 8

James Madison University

From the SelectedWorks of Ray Enke Ph.D.

June, 2016

Creating Custom RNA-Seq Data Tracks in the


UCSC Genome Browser (computational)
Raymond A Enke

This work is licensed under a Creative Commons CC_BY-SA International License.

Available at: https://works.bepress.com/raymond_enke/72/


Creating Custom TopHat Alignment Data Tracks in the UCSC Genome Browser
Dr. Ray Enke Bio 480 Advanced Molecular Bio Lab
James Madison University

How to cite this work:


This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 United States License. Recommended citation: Enke, R.
(2016) Creating Custom TopHat Alignment Data Tracks in the UCSC Genome Browser. CSHL DNALC RNA-Seq for the Next Generation
Working Group. http://www.rnaseqforthenextgeneration.org/profiles/raymond-enke.html#teaching

Objectives:

Create your own custom data tracks in the UCSC Genome Browser
Visualize RNA-Seq TopHat alignment data as custom tracks in the UCSC Genome Browser
Integrate RNA-Seq alignment data with other genome-wide data sets

I. Creating custom data tracks in the UCSC Genome Browser

Last week you viewed and collected some stats from the DNA Subway Green Line about how many
reads were sequenced, mapped and paired from each sample and replicate from the chicken E8 retina,
E18 retina and E18 cornea RNA-Seq experiment after the TopHat software package was run. You saw
that ~30-60 million individual 300 nt paired end sequencing reads were aligned to the reference chicken
genome/sample. This week you will create custom tracks in the Chicken genome assembly within the
UCSC Genome Browser to visualize these large sequencing data sets.

First, you will complete this brief exercise will go over the basic steps to create, label and name your
own custom annotation data tracks in the UCSC Genome Browser using the human genome assembly.

Navigate to the Human 2009 hg19 genome assembly


Navigate to the RHO gene > hide all tracks > add back UCSC gene in pack view
You should see the RHO gene with its 5 exons & 4 introns coded on the top strand of the
genome
We will add a custom data track to label new features on RHO as an example

The Custom Tracks feature in the browser allows you to display your own or previously published data
as 1 or more annotation tracks on top of a specific genome assembly.

Hover over the My Data option in the browser toolbar and select Custom Tracks
This will take you to an Add Custom Tracks page

Cold Spring Harbor Laboratory, DNA Learning Center, 1 Bungtown Road, Cold Spring Harbor, NY 11724

This page allows you to input custom track data 3 different ways:
1. Copy/paste in properly formatted tab separated data
2. browse & upload a tab separated data (.tsv) file containing formatted data
3. paste a link to a host URL containing formatted data

We will start with option #1 to simply copy/paste data into the window to create a custom data track.
First lets define:

1. Tab separated value data (.tsv file): a text file that can be created/viewed by most spreadsheet
programs and text editors. Each entry takes up a single line with the first line serving as the header
line labeling each field. tsv files can be used for any type of data (no required fields). As an
example, here are some stats for several of the 2015 Baltimore Orioles in .tsv format:

Pos Name G PA AB R H HR
C Caleb_Joseph 100 355 320 38 75 11
1B *Chris_Davis 160 670 573 100 150 47
2B Jonathan_Schoop 86 321 305 34 85 15
SS J.J._Hardy 114 437 411 45 90 8
3B Manny_Machado 162 713 633 102 181 35
LF Steve_Pearce 92 325 294 42 64 15
CF Adam_Jones 137 581 546 74 147 27

*led team in HRs

2. Browser Extensible Data (BED) formatting: This is a derivative of TSV data in specific format for
genome browser data. Like .tsv data, each entry takes up a single line. BED lines have 3 required
fields 1) Chromosome # (chr), 2) starting genomic coordinate (start), and 3) ending genomic
coordinate (stop). A 4th optional field can be used as a label for each line entry using any text w/o
special characters or spaces. As an example, here are the chromosomal coordinates for each of
the 5 RHO exons in the human 2009 hg19 genome assembly:

chr start stop exon


chr3 129247482 129247937 exon1
chr3 129249719 129249887 exon2
chr3 129251094 129251259 exon3
chr3 129251376 129251615 exon4
chr3 129252451 129254187 exon5

Cold Spring Harbor Laboratory, DNA Learning Center, 1 Bungtown Road, Cold Spring Harbor, NY 11724

Copy/paste these coordinates w/o the header line into the Add Custom Tracks data window
and hit submit. If you get an error you probably included the header line.
This takes you to a Manage Custom Tracks page giving you some info about your data track

Select the Pos link to view the genomic position listed in the 1st line of your custom track
Zoom out to view the entire RHO gene and view UCSC genes in dense view
You should also see your new custom track called User Track, view in pack view
Your viewer window should look like this:

Hopefully, youve created a custom track individually labeling each of the RHO exons. Next we will edit
some of your custom track parameters such as the title of the track, the color of the track and the
optional data label field (column #4)

Go back to the Manage Custom Tracks page (My Data > Custom Tracks)
Edit your custom track by selecting User Track

This window will allow you to edit your existing custom track or replace it with new data. Keep the same
data but change the data label field by copy/pasting the below BED data into the replacement window
(do not include headers) and hit submit:

chr start stop name or label


chr3 129247482 129247937 BIO480
chr3 129249719 129249887 with
chr3 129251094 129251259 Dr_Enke
chr3 129251376 129251615 is_the
chr3 129252451 129254187 bomb

Cold Spring Harbor Laboratory, DNA Learning Center, 1 Bungtown Road, Cold Spring Harbor, NY 11724

View your custom track in pack view and UCSC genes in full view
This illustrates that you can use the 4th column in a BED formatted file to indicate any parameter
you like (ie statistical rank, up-regulation vs down-regulation, increased DNA methylation vs
decreased DNA methylation, a simple text label, etc)
Your viewer window should look like this:

Next, edit the track name with a short text tag and the track description with a more detailed text tag
Go back to the custom track editing page (My Data > Custom Tracks > User Track)
Change the track name= in the Edit configuration window to exonic seq
Change the description= to protein coding exonic sequence
Navigate back to view the entire RHO gene,
Notice that your custom annotation track reflects the Track Name that you input

Lastly, lets edit the color of the custom track. Track color is defined digitally by Red, Green, Blue (RGB)
values between 0-255. The default color is black (RGB value of 0,0,0). Conversely pure white has an
RGB=255,255,255. Pure red=255,0,0, pure green=0,255,0 and pure blue=0,0,255. Derivative colors
are combinations of R, G, and B (e.g. 206,39,212= pink). The R,G,B value can be entered in the Edit
configuration window after the track description as color=0,0,0

Navigate to the RGB Color Code website: (tinyurl.com/8pa5kvm)


Get the RGB color combination for a handsome teal color (bluish green) or another color of your
liking using the RGB color codes chart
Go back to the custom track editing page (My Data > Custom Tracks > Name hyperlink)
Directly after the track description type color=0,0,0 using your RGB color code of choice

Cold Spring Harbor Laboratory, DNA Learning Center, 1 Bungtown Road, Cold Spring Harbor, NY 11724

Hit submit, navigate back to the full RHO gene and view the custom track in pack view
Copy your BED data and a hi res image of your browser viewer window into your notebook
Your viewer window should now look like this but with your color

*Make up assignment: include a hi res image of this custom track in your make up assignment in
addition to the posted assignment

II. Visualization of RNA-Seq TopHat Alignment Data as custom UCSC tracks

Nice job that was fun! Next you will use these same steps to plot some actual genome-wide data (our
chicken RNA-Seq data).

Navigate to the Chicken 2011 galGal4 genome assembly


Select the add custom tracks option below the search window

Instead of pasting in BED formatted data into the custom track window, we will simply copy/paste URLs
of the websites where the TopHat RNA-Seq alignment data is stored. Remember that TopHat files are
a collection of ~30-60 million individual 150-300 bp FASTQ files aligned and indexed to a reference
genome. Even though these are text files, they are enormous (~1-6 GB of data). Collectively theyre
stored as Binary Alignment/Map (BAM) files. BAM files are too big to open or move around easily.
Instead they are typically hosted on a server and easily accessed using a URL. The BAM files for our 6
RNA-Seq samples are hosted on the server of an NSF-funded virtual organization called CyVerse
Discovery Environment. Your free DNA Subway account also gives you access to Discovery
Environment data storage and bioinformatics tools. Today though, we will only need the URLs for 1
replicate of each sample, which Ive stored on a class GoogleSheet (tinyurl.com/z6bhr6u).

Visualizing BAM files from a RNA-Seq transcriptomes experiments in a genome browser can be a
useful way to qualitatively assess differential expression at a particular locus. Mapped reads can be
compared between samples to determine if the accumulated sequences are equal between samples
(no change in expression), higher in sample #1 than #2 (upregulated in #1) or vice versa
(downregulated in sample #1). Additionally, different isoform species can also be visualized between
samples using BAM data. To build custom BAM alignment data tracks in the UCSC Genome Browser,
we will simply upload the URL of the site where your BAM file is hosted.

Copy/paste the BAM URL from the E8 retina replicate #1 into the custom track window
Click into the custom track Name and change the track name and description
Do this by replacing the information in the Edit Configuration window with the following:

track name=E8_retina description=E8_retina_RNA-Seq_rep1

Hit submit. Your track Name & Description should reflect this change. If not, try again

Cold Spring Harbor Laboratory, DNA Learning Center, 1 Bungtown Road, Cold Spring Harbor, NY 11724

Hit go, navigate to the Rho gene and view only RefSeq genes in full view and your custom track
in dense view to get a basic visualization of RNA-Seq reads
Click the custom track hyperlink or the grey bar to the left of the custom track to reconfigure the
data into a BAM Density Plot

Change display mode to full and select display data as a density graph and hit submit

Cold Spring Harbor Laboratory, DNA Learning Center, 1 Bungtown Road, Cold Spring Harbor, NY 11724

Repeat these steps for the E18 retina and E18 cornea bam data. If you like, you can change the colors
for the custom tracks. If all goes well your view of the Rhodopsin gene should look like this (but with
cornea data included):

Save this session as Gg your initials BAM Density & copy the session URL to your notebook
Copy a hi res image of your browser window of Rho & +-5Kb area into your notebook
Provide a brief summary and interpretation of the data shown in your window

Cold Spring Harbor Laboratory, DNA Learning Center, 1 Bungtown Road, Cold Spring Harbor, NY 11724

Vous aimerez peut-être aussi