Académique Documents
Professionnel Documents
Culture Documents
Variant evalua,on
Inputs
V
Original
VCF
le
select
A
selec,on
expression
sn/sf/se
Sample
selec,on
by
(name/names
in
a
le/expression)
Outputs
o
The
resul,ng
VCF
aGer
selec,on
criteria
applied
Example:
How
many
variants
do
I
have
in
a
cohort?
Task:
Subset
a
VCF
to
a
specic
group
of
samples
Tool:
SelectVariants
Data:
Genotype
VCF,
sample
list
(le)
Inputs
-V
(mul,ple)
the
VCFs
to
combine
Outputs
-o
A
VCF
with
the
sites,
samples,
and
genotypes
resul,ng
from
merging
all
of
the
input
VCF
les
given
How
many
variants
are
present
in
both
of
my
cohorts?
Task:
Calculate
which
variants
are
present
in
both
callset
#1
and
callset
#2
Tool:
CombineVariants
Data:
Site
or
Genotype
VCFs
for
each
cohort
Inputs
eval
(mul,ple)
the
call
set(s)
to
be
evaluated
comp
(mul,ple)
the
call
set(s)
to
use
as
comparators
D
dbSNP
track
Other
parameters
of
interest
EV
(mul,ple)
addi,onal
evalua,on
module(s)
to
use
ST
(mul,ple)
addi,onal
stra,ca,on(s)
to
use
Outputs
o
A
GATKReport
text
le
containing
tables
of
evalua,on
results
Output
of
Variant
Eval
Parsing format
Parsing&format&
Table&name&and&descrip/on&
Table& Stra/ca/ons:&eval&set,&comp&set,&
Name& Tabulated&results&(counts&in&this&case)&
select&expressions,&novelty&
Overltering?
Contamina,on?
testCalls
FIN
and
the
matched
1000G
comparison
set:
12
Other
metrics
line
up
well,
so
calls
are
4
Kinship
->
degree
of
rela,on
between
samples
(King
/
PLINK)
Pedigree
->
reconstruct
family
structure
(trios)
Sex
->
coverage
/
clustering
analysis
over
X
and
Y
Many
projects
discard
samples
with
non-standard
sex
genotypes
(e.g.
X0,
XXY)
Ethnicity
inference
->
PCA
+
clustering
on
subset
of
conserved
sites
(S.
Purcell)
These
methods
developed
for
GWAS
can
be
used
for
QC
purposes,
e.g.
to
check
idenQty
and
verify
supplied
metadata,
as
well
as
adjust
variant
QC
expectaQons
Pairwise
kinship
inference
(King
/
PLINK)
Duplicates
Parent-
Ospring
Siblings
Monkol
Lek,
2014
Monkol
Lek,
2014
24K
count
21K
18K
1000G
SNP
ATV
Bipolar
BUP
ESP
Ovawa
NFBC
SCZ
T2D-GENES
GoT2D
Ethnicity
aects
many
variant
call
metrics
SIGMA
TAT
Older
popula,ons
tend
to
display
more
heterogeneity
We are here in the Best Practices workflow
Variant
evalua,on
talks
Further
reading
hvp://www.broadins,tute.org/gatk/guide/best-prac,ces
hvp://www.broadins,tute.org/gatk/guide/ar,cle?id=51
hvp://www.broadins,tute.org/gatk/guide/ar,cle?id=48
hvp://www.broadins,tute.org/gatk/guide/ar,cle?id=53
hvp://www.broadins,tute.org/gatk/guide/ar,cle?id=54
hvp://www.broadins,tute.org/gatk/gatkdocs/#VariantEvalua,onandManipula,onTools