Vous êtes sur la page 1sur 26

The Search for Cancers Causes and Cures

Wade L. Schulz, MD, PhD


Yale University, Department of Laboratory Medicine

Cancer Statistics An Improving Outlook?


600

Rate per 100,000

500

400

300

200

100

Incidence

{ }

Mortality

CC-BY-ND 4.0

{1}

Precision Medicine
Tailoring medical therapy to a particular patients characteristics

{ }

CC-BY-ND 4.0

{2}

Presentation to Precision Care

Images adapted from Servier Medical Art, CC-BY

{ }

CC-BY-ND 4.0

{3}

When Cells Go Bad

{ }

CC-BY-ND 4.0

{4}

Genetics in 60 Seconds

{ }

CC-BY-ND 4.0

{5}

Genetics in 60 Seconds

{ }

CC-BY-ND 4.0

{6}

Searching for Mutations


Gels and Capillaries

{ }

CC-BY-ND 4.0

{7}

Next Generation Sequencing


Massively Parallel

{ }

CC-BY-ND 4.0

{8}

NGS The Technology

{ }

CC-BY-ND 4.0

{9}

{ }
CC-BY-ND 4.0

May-14

Jan-14

Sep-13

May-13

Jan-13

Sep-12

Moore's Law

May-12

Jan-12

Sep-11

May-11

Jan-11

Sep-10

May-10

$100,000,000

Jan-10

Sep-09

May-09

Jan-09

Sep-08

May-08

Jan-08

Sep-07

May-07

Jan-07

Sep-06

May-06

Jan-06

Sep-05

May-05

Jan-05

Sep-04

May-04

Jan-04

Sep-03

May-03

Jan-03

Sep-02

May-02

Jan-02

Sep-01

Cost of Sequencing
Cost per Genome

$10,000,000

$1,000,000

$100,000

$10,000

$1,000

$100

$10

$1

{ 10 }

Bases to Bytes
How big is the genome?

23 chromosomes 21,000 genes


3,300,000,000 base pairs
3.3e9 bases X 2 bits 825 MB/sequence

With metadata: 150 GB/sequence


3,000,000 variants/genome

{ }

CC-BY-ND 4.0

{ 11 }

What are the Problems?

Constantly evolving data schema


Ability to integrate diverse data silos
Rapidly increasing needs for data storage
Need for easy, flexible analysis

{ }

CC-BY-ND 4.0

{ 12 }

Why Elasticsearch?
Its great!
-

Rapid on-premise and cloud installations


Dynamic schema that supported clinical results and annotation data
Availability of libraries for multiple languages (NEST, elasticsearch-py)
Tool availability (Kibana, Shield)

{ }

CC-BY-ND 4.0

{ 13 }

Sequencing and Interpretation Pipeline

Gene
Sequencing

Sequence
Alignment
Clinical
Interpretation
{galileo}

{ }

{galileo}

{kepler}

Quality
Assurance

Variant
Annotation

Clinical Trial
Eligibility
{galileo/kepler}

Research
Management
{galileo/kepler}

CC-BY-ND 4.0

{ 14 }

Whats in a Variant?
60G6V:01053:03044
16
chr1 161383
0
16M *
0
0
TTTGCCAGAAAGCAAG
)///7;;6*669:1:5
ZP:B:f,0.00279573,0.0054005,2.19516e-07
ZM:B:s,244,0,242,0,0,242,2,270,494,300,0,248,36,0,0,0,272,0,204,272,398,248,246,268,270,0,0,0,302,0,0,0,550,
38,44,194,14,32,204,2,666,212,222,494,2,2,238,630,92,220,4,102,438,2,60,384,2,76,2,2,294,394,34 ZF:i:28
RG:Z:60G6V.
PG:Z:tmap MD:Z:16
NM:i:0 AS:i:16
XA:Z:map4-1 XS:i:16

60G6V:00605:00113
0
chr1 415215
2
8M5I31M3S *
0
0
CCAGCCTGGGTGCGTGACAGAGCAAGACTCCGTCTAAAAAGAAAGGT
B<A??8?@@9A?@DFCEBBBBBAA<BACBK;@?>98999'/;;'+'+
ZP:B:f,0.00288978,0.00437853,4.26593e-06
ZA:i:116
ZG:i:204
ZB:i:30
ZC:B:i,204,201,3
ZM:B:s,232,12,238,0,0,212,0,272,398,218,0,282,0,0,4,0,256,14,274,220,520,220,244,290,270,8,4,0,468,232,274,
524,216,0,10,748,238,0,54,260,2,190,0,256,0,30,14,0,252,0,290,206,26,238,32,214,6,238,0,218,28,38,268,216,8,498,2,210,-2,238,270,32,222,436,-4,246,66,54,8,62,202,268,32,-8,238,76,4,986,20,226,8,660,32,24,378,6,174,224,146,264,260,30,136,160,256,20,20,418,234,62,18,12
ZF:i:28
RG:Z:60G6V.
PG:Z:tmap
MD:Z:35A3 NM:i:6 AS:i:20
XA:Z:map4-1 XS:i:19

{ }

CC-BY-ND 4.0

{ 15 }

Whats in a Variant?
{
"chromosome": "chr7",
"position": 148506396,
"type": "snv",
"refAllele": "A",
"altAllele": "C",
"totalReads": 1998,
"forwardReads": 1038,
"forwardRefReads": 524,
"forwardAltReads": 514,
"reverseReads": 960,
"reverseRefReads": 500,
"reverseAltReads": 460,
"refReads": 1024,
"altReads": 974,
"vaf": 48.749,
"variantRegion": "intronic",
"variantEffect": "",
"snvEffect": "A>C",
"gene": "EZH2

Variant location in genome


Nucleotide change
Sequencing statistics
Variant prevalence in specimen
Variant coding/protein effects

{ }

CC-BY-ND 4.0

{ 16 }

{Elastic} Searching for Meaning


Public
Databases

OMIM

dbSNP

COSMIC

ClinVar

Sequencers

Azure
Elasticsearch
Public Variant Data

Variant Analysis
Effect Prediction

Local SQL
and Elasticsearch
Private Variant Data

{ }

CC-BY-ND 4.0

{ 17 }

{Elastic} Searching for Meaning


Public
Databases

OMIM

dbSNP

COSMIC

ClinVar

Sequencers

Public Variant Data

Variant Analysis
Effect Prediction

MVC Application
(NEST)

Private Variant Data

{ }

CC-BY-ND 4.0

{ 18 }

Kibana Drilldown
Rapid population stats
Physicians/researchers can quickly analyze data
Integration with health record

Demographics
Laboratory testing
Comorbidities
Treatment information

{ }

CC-BY-ND 4.0

{ 19 }

Kibana Drilldown

{ }

CC-BY-ND 4.0

{ 20 }

Service Integration

Clinical Interpretation
System

Third-Party
Data Analysis
Software

Variant
Database

Predictive Algorithms
3

Web Service
Interfaces

2
1
0
-1
-2
Custom Validation
Scripts

{ }

-3

Quality Assurance
CC-BY-ND 4.0

{ 21 }

Data Sharing

Clinical Interpretation
System

Variant
Database

{ }

Web Service
Interfaces

CC-BY-ND 4.0

{ 22 }

Conclusions
System statistics

Clinical implications

Two Elasticsearch clusters


Over 60 million variant
annotations
Nearly 10 million documents
related to cancer-associated
mutations
Kibana and custom web
applications using NEST for
data visualization
{ }

Genetic sequencing and


clinical consultation complete
within one week of biopsy
Integrated multiple analysis
pipelines for clinical
interpretation and research
applications
Frequently identify patients
eligible for clinical trials
CC-BY-ND 4.0

{ 24 }

{ }
Thank you!
Henry Rinder MD, Richard Torres MD, Christopher Tormey MD, Brian Smith MD, John Howe PhD,
Karl Hager PhD, Rodion Rathbone MD, Nathaniel Price, Alexa Siddon MD

Wade L. Schulz, MD, PhD


wade.schulz@yale.edu
http://www.wadeschulz.com

Many images adapted from Servier Medical Art, CC-BY

This work is licensed under the Creative Commons


Attribution-NoDerivatives 4.0 International License.
To view a copy of this license, visit:
http://creativecommons.org/licenses/by-nd/4.0/

or send a letter to:


Creative Commons
PO Box 1866
Mountain View, CA 94042
USA

{ }

CC-BY-ND 4.0

{ 25 }

Vous aimerez peut-être aussi