Vous êtes sur la page 1sur 3

Accessing the quality of raw NGS data

Current days Next Generation Sequencing (NGS) technologies employ entirely different types of
methods to read sequences and consequently, their reads vary in terms of read length, time, accuracy
and the cost involved. However, there are two common aspects associated with them. First, most of
the NGS platforms are not capable enough to read the entire genome (be it bacterial or human) in a
single go. Therefore, we get short stretches of reads representing the entire genome which need to be
assembled. Second, each base that is called is associated with certain degree of uncertainty, which
translates into the quality of that particular base. A typical read is represented as FASTQ format
which includes at least these two aspects - the read along with the quality for each base read. A
FASTQ format of a single read will look like the following

An assessment of the quality of the reads is a must before going for the downstream studies. A
convenient tool for this task is the FASTQC tool of the Galaxy platform. Galaxy is a web based
platform for interactive large-scale genome analysis.
Accessing read quality using FASTQC i.

Log in to https://usegalaxy.org/ (or register if using for first time)

ii.

Click on Get Data in the menu on the left side and choose the Upload file from computer
option. This will open a dialog box. Upload your FASTQ file and click Start. You can close
the dialog box to simultaneously go ahead while your data is being uploaded. You can view
the raw reads and additional information by clicking on this entry in the history panel, which
appears at the right side.

iii.

Find and click NGS: QC and manipulation, choose FASTQC option to generate a quality
report of the reads, in a format convenient to understand for a user. You can choose the
Multiple datasets mode to load multiple files. The file-list accesses the history (on the right
panel, not to be confused with the browser history).

iv.

Choose the appropriate files and click Execute. This entry will now appear on the history
panel. You can view the results by clicking on the
icon. Below is a typical result of
FASTQC-

The values show the quality across different bases in all the reads. The scores are called
PHRED score, which translate to the following
PHRED quality score
20
30
40
50

Probability that the base is


called wrong
1 in 100
1 in 1,000
1 in 10,000
1 in 100,000

Accuracy of the base call


99%
99.90%
99.99%
100.00%

The acceptable limit is 30 or more. In the results shown, all have a score of >30, which shows
they are reliable sequence reads.

Vous aimerez peut-être aussi