Vous êtes sur la page 1sur 80

NCBI Entrez Digital

Tools and Utilities


Jonathan A. Kans, Ph.D.
Staff Scientist, NCBI
jkans@stanford.edu
1

Topics
Advanced Features of Entrez
(to help separate the wheat from the chaff)

Programmatic Access with EUtils


(automate repeatable multi-step queries)

EBot Generated Scripts


(if you really don't want to write a program)

Comparative Analysis
Anatomy
Physiology
Biochemistry
Gene Sequences

Central Dogma of
Molecular Biology
DNA
(information)
transcription
(polymerase)

mRNA

RNA
(expression)
translation
(ribosome)

CDS

Protein
(function)

Genetic Diseases
Specific molecular defects explain disease
-globin gene and protein sequences
...ATGGTGCATCTGACTCCTGAGGAGAAG...AAGTATCACTAA...
(M) V H L T P E E K ... K Y H (*)

Sickle-cell anemia variant


...ATGGTGCATCTGACTCCTGTGGAGAAG...AAGTATCACTAA...
(M) V H L T P V E K ... K Y H (*)

Evolutionary Conservation
3000 M yr

1000 M yr

500 M yr

Bacteria

Yeast

Worm

Fly

Mouse

Human

Human 638 RHACVEVQDEIAFIPNDVYFEKDKQMFHIITGPNMGGKSTYIRQTGVIVLMAQIGCFVPC 697


Yeast 657 RHPVLEMQDDISFISNDVTLESGKGDFLIITGPNMGGKSTYIRQVGVISLMAQIGCFVPC 716
E.coli 584 RHPVVEQVLNEPFIANPLNLSPQRR-MLIITGPNMGGKSTYMRQTALIALMAYIGSYVPA 642

Colon cancer gene sequence (DNA mismatch repair protein)

Design of Entrez
Term frequency
statistics

MEDLINE
Literature
citations in
sequence

Literature
citations in
sequence

Nucleotide
Nucleotide
sequence
similarity

Coding
region
features

Protein
Amino acid
sequence
similarity

Entrez Databases

PubMed Search

PubMed Fields

10

Advanced Search

11

Field Abbreviations
Affiliation
All Fields
Author
Author - Corporate
Author - First
Author - Full
Author - Last
Book
Date - Completion
Date - Create
Date - Entrez
Date - MeSH
Date - Modification
Date - Publication
EC/RN Number
Editor
Filter
Grant Number
ISBN
Investigator
Investigator - Full

[AFFL]
[ALL]
[AUTH]
[COLN]
[FAUT]
[FULL]
[LAUT]
[BOOK]
[CDAT]
[CRDT]
[EDAT]
[MHDA]
[MDAT]
[PDAT]
[ECNO]
[ED]
[FILT]
[GRNT]
[ISBN]
[INVR]
[FINV]

Issue
Journal
Language
Location ID
MeSH Major Topic
MeSH Subheading
MeSH Terms
Pagination
Pharmacological Action
Publication Type
Publisher
Publisher ID
Secondary Source ID
Supplementary Concept
Text Word
Title
Title/Abstract
Transliterated Title
UID
Volume

[ISS]
[JOUR]
[LANG]
[LID]
[MAJR]
[SUBH]
[MESH]
[PAGE]
[PAPX]
[PTYP]
[PUBN]
[PID]
[SI]
[SUBS]
[WORD]
[TITL]
[TIAB]
[TT]
[UID]
[VOL]

12

MeSH Categories
Anatomy
Organisms
Diseases
Chemicals and Drugs
Analytical, Diagnostic and Therapeutic Techniques and Equipment
Psychiatry and Psychology
Phenomena and Processes
Disciplines and Occupations
Anthropology, Education, Sociology and Social Phenomena
Technology, Industry, Agriculture
Humanities
Information Science
Named Groups
Health Care
Publication Characteristics
Geographicals

13

Organism Hierarchy
Eukaryota
Alveolata
Amoebozoa
Animals
Animal Population Groups
Choradata
Invertebrates
Choanoflagellata
Cryptophyta
Diplomonadida
Euglenozoa
Fungi
Haptophyta
Mesomycetozoea
Oxymonadida
Parabasalidea
Plants
Retortamonadidae
Rhizaria
Stramenopiles
Archaea
Bacteria
Viruses
Other Forms

14

Useful Queries
humans [MESH]
pharmacokinetics [MESH]
chemically induced [SUBH]
all child [FILT]
loprovflybase [FILT]
randomized controlled trial [FILT]
clinical trial, phase ii [PTYP]
mammalia [ORGN]
mammalia [ORGN:noexp]
cds [FKEY]
lacz [GENE]
beta galactosidase [PROT]
biomol genomic [PROP]
dbxref flybase [PROP]
gbdiv phg [PROP]
src cultivar [PROP]
srcdb refseq validated [PROP]
150:200 [SLEN]

15

Structured Query

transposition [TITL] AND (protease OR peptidase) NOT humans [MESH]


16

Using History

17

History Results

18

PubMed Record

19

Neighbor Hyperlink

20

Related Citations

21

Relevant Publication

22

Selecting Target

23

GenBank Record

24

Graphical View

25

LOCUS
DEFINITION

HUMADH1CB
1400 bp
mRNA
PRI
15-JUN-1989
Homo sapiens class I alcohol dehydrogenase (ADH1) alpha subunit
mRNA, complete cds.
ACCESSION
M12271
KEYWORDS
alcohol dehydrogenase; dehydrogenase.
SOURCE
Human liver, cDNA to mRNA, clone pUCADH-alpha-15L.
ORGANISM Homo sapiens
Eukaryota; Animalia; Metazoa; Chordata; Vertebrata; Mammalia;
Theria; Eutheria; Primates; Haplorhini; Catarrhini; Hominidae;
Homo; sapiens.
REFERENCE
1 (bases 1 to 1400)
AUTHORS
Ikuta,T., Szeto,S. and Yoshida,A.
TITLE
Three human alcohol dehydrogenase subunits: cDNA structure and
molecular and evolutionary divergence
JOURNAL
Proc. Natl. Acad. Sci. U.S.A. 83 (3), 634-638 (1986)
STANDARD full staff_review
COMMENT
A draft entry and printed copy of the sequence in [1] were
kindly provided by A.Yoshida, 30-MAY-1986.
The other human class I ADH1 alpha subunit sequence is found
under accession M11307.
FEATURES
Location/Qualifiers
mRNA
<1..1400
/note="ADH1 mRNA"
CDS
16..1143
/note="alcohol dehydrogenase alpha subunit (EC 1.1.1.1)"
/map="'4q21' /hgml_locus_uid='LJ0082S'"
/gene="ADH1"
BASE COUNT
400 a
294 c
340 g
366 t
ORIGIN
52 bp upstream of PvuII site; chromosome 4q21.
1 gaagacagaa tcaacatgag cacagcagga aaagtaatca aatgcaaagc agctgtgcta
61 tgggagttaa agaaaccctt ttccattgag gaggtggagg ttgcacctcc taaggcccat
121 gaagttcgta ttaagatggt ggctgtagga atctgtggca cagatgacca cgtggttagt
181 ggtaccatgg tgaccccact tcctgtgatt ttaggccatg aggcagccgg catcgtggag
241 agtgttggag aaggggtgac tacagtcaaa ccaggtgata aagtcatccc actcgctatt
301 cctcagtgtg gaaaatgcag aatttgtaaa aacccggaga gcaactactg cttgaaaaac
361 gatgtaagca atcctcaggg gaccctgcag gatggcacca gcaggttcac ctgcaggagg
421 aagcccatcc accacttcct tggcatcagc accttctcac agtacacagt ggtggatgaa
481 aatgcagtag ccaaaattga tgcagcctcg cctctagaga aagtctgtct cattggctgt
541 ggattttcaa ctggttatgg gtctgcagtc aatgttgcca aggtcacccc aggctctacc
601 tgtgctgtgt ttggcctggg aggggtcggc ctatctgcta ttatgggctg taaagcagct
661 ggggcagcca gaatcattgc ggtggacatc aacaaggaca aatttgcaaa ggccaaagag
721 ttgggggcca ctgaatgcat caaccctcaa gactacaaga aacccatcca ggaggtgcta

26

ENTRY
TITLE

DEHUAA
#Type Protein
Alcohol dehydrogenase alpha chain - Human #EC - number
1.1.1.1
DATE
28-Dec-1987
#Sequence 28-Dec-1987
#Text 30-Sep-1989
PLACEMENT
27.0
1.0
1.0
1.0
1.0
SOURCE
Homo sapiens # Common-name man
ACCESSION
A25428
REFERENCE
(Sequence translated from the mRNA sequence)
#Authors
Ikuta T., Szeto S., Yoshida A.
#Journal
Proc. Nat. Acad. Sci. USA (1986) 83:634-638
#Title
Three human alcohol dehydrogenase subunits: cDNA
structure and molecular and evolutionary
divergence.
GENETIC
#Map-position 4q21-q25
#Name
ADH1
SUPERFAMILY
#Name alcohol dehydrogenase
KEYWORDS
oxidoreductase
SUMMARY
#Molecular-weight 39858 #Length 375 #Checksum 7545
SEQUENCE
5
10
15
20
25
30
1 M S T A G K V I K C K A A V L W E L K K P F S I E E V E V A
31 P P K A H E V R I K M V A V G I C G T D D H V V S G T M V T
61 P L P V I L G H E A A G I V E S V G E G V T T V K P G D K V
91 I P L A I P Q C G K C R I C K N P E S N Y C L K N D V S N P
121 Q G T L Q D G T S R F T C R R K P I H H F L G I S T F S Q Y
151 T V V D E N A V A K I D A A S P L E K V C L I G C G F S T G
181 Y G S A V N V A K V T P G S T C A V F G L G G V G L S A I M
211 G C K A A G A A R I I A V D I N K D K F A K A K E L G A T E
241 C I N P Q D Y K K P I Q E V L K E M T D G G V D F S F E V I
271 G R L D T M M A S L L C C H E A C G T S V I V G V P P D S Q
301 N L S M N P M L L L T G R T W K G A I L G G F K S K E C V P
331 K L V A D F M A K K F S L D A L I T H V L P F E K I N E G F
361 D L L H S G K S I R T I L M F
///

27

Same Publication?
JOURNAL

Proc. Natl. Acad. Sci. U.S.A. 83 (3), 634-638 (1986)

#Journal

Proc. Nat. Acad. Sci. USA (1986)

83:634-638

28

Exponential Growth

29

Sequence Identifiers
Accession:
GI Number:
Accn.Ver:
FASTA:

AH006997
6849043
AH006997.2
>gi|6849043|gb|AH006997.2

30

Sequence Assembly
NC_000022.9

join(gap(14430000),gi|89058412:1..647850,gap(150000),gi|29806588:1..3661581 )

NT_028395.3

NT_011519.10

join(gi|5931500:1..37693,gi|5931501:2273..41306 )

AP000522.1

GATCTGATAAGTCCCAGGAC

AP000523.1

TGGTATCCACCTGGGGCCTG

31

Features and Qualifiers


gene

CDS

sig_peptide
mat_peptide

1..417
/gene="INS"
/db_xref="GeneID:449570"
60..392
/gene="INS"
/codon_start=1
/product="proinsulin precursor"
/protein_id="NP_001008996.1"
/translation="MALWMRLLPLL ... YQLENYCN"
60..131
/gene="INS"
132..389
/gene="INS"
/product="Insulin"

32

Graphical Views

33

Translation Validation
DNA
...cgaaaagGTGGTAGTGTAGGAGACGGTGAAGctaaga...
/translation
- V V * E T V K
Protein
M V V L E T E K

SEQ_FEAT_StartCodon

SEQ_FEAT_InternalStop

SEQ_FEAT_MismatchAA

SEQ_FEAT_NotSpliceConsensusDonor

34

Alignments
Describe relationships between sequences
Can reflect evolutionary conservation,

structural similarity, functional similarity

Can be generated algorithmically (e.g.,


BLAST) or manually

MRLTLLC-------EGEEGSELPLCASCGQRIELKYKPECYPDVKNSLHV
MRLTLLCCTWREERMGEEGSELPVCASCGQRLELKYKPECFPDVKNSIHA
MRLTCLCRTWREERMGEEGSEIPVCASCGQRIELKYKPE-----------

35

Original Databases
Term frequency
statistics

MEDLINE
Literature
citations in
sequence

Literature
citations in
sequence

Nucleotide
Nucleotide
sequence
similarity

Coding
region
features

Protein
Amino acid
sequence
similarity

36

Discovery Space
PubMed
Publishers

PubMed
abstracts

Entrez
Complete Genomes
Genomes
Genome
Centers
MMDB

Taxon
Phylogeny

Nucleotide
sequences

33-D
-D
Structure

Protein
sequences

37

Data Integration

38

Leveraging Resources
GenBank
RefSeq
Human Genome
Bacterial Genome
Virus Genome
MMDB
PubMed
UniGene(s)
LocusLink
OMIM
Taxonomy
GEO
PopSet
BLAST
Entrez
ePCR
Sequin

39

Entrez Utilities
EInfo
ESearch
ESummary
EFetch
ELink
EPost

40

EUtils Base URL


http://eutils.ncbi.nlm.nih.gov/entrez/eutils/program.fcgi?arguments

41

EUtils Arguments
db

pubmed | nucleotide | protein

term
id

transposition+AND+(protease+OR+peptidase)
172344,U54439.1

rettype
retmode
retstart
retmax

abstract | acc | seqid | gb | fasta | count


text | xml | asn.1

datetype
reldate

mdat | pdat | edat


60

dbfrom
cmd
linkname

pubmed | nucleotide | protein


neighbor
gene_snp_genegenotype

usehistory
WebEnv
query_key

y
NCID_1_216999436_130...086_61936294
1

version
tool

2.0

42

rettype=abstract
1. Mol Microbiol. 2012 Feb;83(4):805-20.
Separate structural and functional domains of Tn4430 transposase
contribute to target immunity.
Lambin M, Nicolas E, Oger CA, Nguyen N, Prozzi D, Hallet B.
GSK Biologicals, Rue Flemming, 20, 1300 Wavre, Belgium.
bernard.hallet@uclouvain.be
Like other transposons of the Tn3 family, Tn4430 exhibits target
immunity, a process that prevents multiple insertions of the
transposon into the same DNA molecule. Immunity is conferred by
the terminal inverted repeats of the transposon and is specific
to each element of the family, indicating that the transposase
...
transposition. One class of mutations was found to stimulate
transposition, whereas other mutations appeared to reduce TnpA
activity. The data are discussed with respect to alternative
models in which TnpA acts as a specific determinant to both
establish and respond to immunity.
PMID: 22624153

[PubMed - indexed for MEDLINE]

43

rettype=medline
PMIDOWN STATDA DCOMIS IS VI IP DP TI PG
AB
AD

22624153
NLM
MEDLINE
20120523
20120529
1365-2958 (Electronic)
0950-382X (Linking)
83
4
2012 Feb
Separate structural and functional domains of Tn4430 transposase
contribute to target immunity.
- 805-20
- Like other transposons of the Tn3 family, Tn4430 exhibits target
immunity, a process that prevents multiple insertions of the
...
- GSK Biologicals, Rue Flemming, 20, 1300 Wavre, Belgium.
bernard.hallet@uclouvain.be

...
AID - 10.1111/j.1365-2958.2012.07967.x [doi]
PST - ppublish
SO - Mol Microbiol. 2012 Feb;83(4):805-20.

44

EInfo URLs
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pubmed

45

curl Command in Terminal


curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi"

https://itservices.stanford.edu/service/sharedcomputing/loggingin

46

Entrez Databases
<?xml version="1.0"?>
<!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD...>
<eInfoResult>
<DbList>
<DbName>pubmed</DbName>
<DbName>protein</DbName>
<DbName>nuccore</DbName>
<DbName>nucleotide</DbName>
<DbName>nucgss</DbName>
<DbName>nucest</DbName>
<DbName>structure</DbName>
<DbName>genome</DbName>
...

47

PubMed Fields
<?xml version="1.0"?>
<!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD...>
<eInfoResult>
<DbInfo>
<DbName>pubmed</DbName>
<MenuName>PubMed</MenuName>
<Description>PubMed bibliographic record</Description>
<Count>22006701</Count>
<LastUpdate>2012/08/04 03:30</LastUpdate>
<FieldList>
...
<Field>
<Name>TIAB</Name>
<FullName>Title/Abstract</FullName>
<Description>Free text associated with Abstract/Title</Description>
<TermCount>38990504</TermCount>
<IsDate>N</IsDate>
<IsNumerical>N</IsNumerical>
<SingleToken>N</SingleToken>
<Hierarchy>N</Hierarchy>
<IsHidden>N</IsHidden>
</Field>
...

48

PubMed Links
<?xml version="1.0"?>
<!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD...>
<eInfoResult>
<DbInfo>
<DbName>pubmed</DbName>
<MenuName>PubMed</MenuName>
...
<LinkList>
...
<Link>
<Name>pubmed_pubmed</Name>
<Menu>Related Citations</Menu>
<Description>Calculated set of PubMed ...</Description>
<DbTo>pubmed</DbTo>
</Link>
...
<Link>
<Name>pubmed_structure</Name>
<Menu>Structure Links</Menu>
<Description>Three-dimensional structure ...</Description>
<DbTo>structure</DbTo>
</Link>
...

49

ESearch URL
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?
db=pubmed&term=transposition+immunity

50

ESummary URL
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?
db=pubmed&version=2.0&id=2539356

51

EFetch URL
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
db=pubmed&rettype=abstract&id=2539356

52

ELink URL
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?
dbfrom=pubmed&db=pubmed&cmd=neighbor&linkname=pubmed_pubmed&
id=2539356

53

curl GET and POST


curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?
db=pubmed&term=transposition+immunity"
curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
-d "db=pubmed&id=22624153,22555593,22253773,21729108,..."

54

Cluttered Result
<?xml version="1.0" ?>
<!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN"
"http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd">
<eSearchResult><Count>94</Count><RetMax>20</RetMax><RetStart>0</
RetStart><IdList> <Id>22624153</Id> <Id>22555593</Id> <Id>22253773</Id>
<Id>21729108</Id> <Id>21695252</Id> <Id>21347312</Id> <Id>20603074</Id>
<Id>20481492</Id> <Id>20004590</Id> <Id>19464182</Id> <Id>19431236</Id>
<Id>19237527</Id> <Id>19188259</Id> <Id>19144000</Id> <Id>19120617</Id>
<Id>18931389</Id> <Id>18838147</Id> <Id>18396069</Id> <Id>17966893</Id>
<Id>17709741</Id> </IdList><TranslationSet><Translation> <From>immunity</
From> <To>"immunity"[MeSH Terms] OR "immunity"[All Fields]</To> </
Translation></TranslationSet><TranslationStack> <TermSet>
<Term>transposition[All Fields]</Term> <Field>All Fields</Field>
<Count>19362</Count> <Explode>Y</Explode> </TermSet> <TermSet>
<Term>"immunity"[MeSH Terms]</Term> <Field>MeSH Terms</Field>
<Count>252127</Count> <Explode>Y</Explode> </TermSet> <TermSet>
<Term>"immunity"[All Fields]</Term> <Field>All Fields</Field>
<Count>189033</Count> <Explode>Y</Explode> </TermSet> <OP>OR</OP>
<OP>GROUP</OP> <OP>AND</OP> <OP>GROUP</OP> </
TranslationStack><QueryTranslation>transposition[All Fields] AND
("immunity"[MeSH Terms] OR "immunity"[All Fields])</QueryTranslation></
eSearchResult>

55

Cleaned for Parsing


<?xml version="1.0"?>
<!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD...>
<eSearchResult>
<Count>94</Count>
<RetMax>20</RetMax>
<RetStart>0</RetStart>
<IdList>
<Id>22624153</Id>
<Id>22555593</Id>
<Id>22253773</Id>
<Id>21729108</Id>
<Id>21695252</Id>
<Id>21347312</Id>
<Id>20603074</Id>
<Id>20481492</Id>
<Id>20004590</Id>
<Id>19464182</Id>
...

56

Reformat XML
...
<IdList> <Id>22624153</Id> <Id>22555593</Id> <Id>22253773</Id>
<Id>21729108</Id> <Id>21695252</Id> <Id>21347312</Id> <Id>20603074</Id>
...

xmllint --format ...


<IdList>
<Id>22624153</Id>
<Id>22555593</Id>
<Id>22253773</Id>
<Id>21729108</Id>
<Id>21695252</Id>
<Id>21347312</Id>
<Id>20603074</Id>
...

57

Extract ID Numbers
...
<IdList>
<Id>22624153</Id>
<Id>22555593</Id>
<Id>22253773</Id>
...

perl -nle 'print /(?<=<Id>).*?(?=<\/Id>)/g'

22624153
22555593
22253773
...

58

Remove Blank Lines


22624153
22555593
22253773
...

grep [0-9]

22624153
22555593
22253773
...

59

UNIX Pipes
curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi" \
-d "db=pubmed&term=transposition+immunity" | \
xmllint --format - | \
perl -nle 'print /(?<=<Id>).*?(?=<\/Id>)/g' | \
grep [0-9]

60

Resulting List of IDs


22624153
22555593
22253773
21729108
21695252
21347312
20603074
20481492
20004590
19464182
...

61

UNIX Shell Script


#!/bin/sh
encoded=$(echo "$2" | sed -e 's/ /%20/g' -e 's/\&/%26/g' -e 's/'\''/%27/g' \
-e 's/(/%28/g' -e 's/)/%29/g' -e 's/,/%2c/g' -e 's/\[/%5b/g' -e 's/\]/%5d/g')
base='http://eutils.ncbi.nlm.nih.gov/entrez/eutils'
suffix="&rettype=xml&retmax=200"
if [ -n "$3" ]; then
suffix="&rettype=xml&retmax=200&reldate=$3"
fi
res=`curl -s "$base/esearch.fcgi?db=$1&term=$encoded$suffix"`
flt=`echo $res | xmllint --format - | \
perl -nle 'print /(?<=<Id>).*?(?=<\/Id>)/g' | grep [0-9]`
for uid in $flt
do
echo "$uid"
done

./esrch.sh pubmed "transposition immunity Tn3" 365


62

ESearch -> ESummary


#!/bin/sh
encoded=$(echo "$2" | sed -e 's/ /%20/g' -e 's/\&/%26/g' -e 's/'\''/%27/g' \
-e 's/(/%28/g' -e 's/)/%29/g' -e 's/,/%2c/g' -e 's/\[/%5b/g' -e 's/\]/%5d/g')
base='http://eutils.ncbi.nlm.nih.gov/entrez/eutils'
res=`curl -s "$base/esearch.fcgi?db=$1&term=$encoded&rettype=xml&retmax=200"`
flt=`echo $res | xmllint --format - | \
perl -nle 'print /(?<=<Id>).*?(?=<\/Id>)/g' | grep [0-9]`
for uid in $flt
do
res=`curl -s "$base/esummary.fcgi?db=$1&version=2.0&id=$uid"`
sum=`echo $res | xmllint --format -`
echo "$sum"
done

63

ESearch -> IDs


#!/bin/sh
encoded=$(echo "$2" | sed -e 's/ /%20/g' -e 's/\&/%26/g' -e 's/'\''/%27/g' \
-e 's/(/%28/g' -e 's/)/%29/g' -e 's/,/%2c/g' -e 's/\[/%5b/g' -e 's/\]/%5d/g')
base='http://eutils.ncbi.nlm.nih.gov/entrez/eutils'
res=`curl -s "$base/esearch.fcgi?db=$1&term=$encoded&rettype=xml&retmax=200"`
flt=`echo $res | xmllint --format - | \
perl -nle 'print /(?<=<Id>).*?(?=<\/Id>)/g' | grep [0-9]`
for uid in $flt
do
echo "$uid"
done

64

IDs -> ESummary


#!/bin/sh
base='http://eutils.ncbi.nlm.nih.gov/entrez/eutils'
while read uid; do
res=`curl -s "$base/esummary.fcgi?db=$1&version=2.0&id=$uid"`
sum=`echo $res | xmllint --format -`
echo "$sum"
done

./esrch.sh pubmed "transposition immunity" | ./esmry.sh pubmed

65

IDs -> E-Mail Notification


#!/bin/sh
while read uid; do
echo $uid | mail -s "$1" "$2"
done

./esrch.sh pubmed "Competitor JQ [AUTH]" 30 | \


./eping.sh "Read this new publication" "myemail@myschool.edu"

66

Document Summaries
<eSummaryResult>
<DocumentSummarySet status="OK">
<DocumentSummary uid="22624153">
<PubDate>2012 Feb</PubDate>
<EPubDate/>
<Source>Mol Microbiol</Source>
<Authors>
<Author>
<Name>Lambin M</Name>
<AuthType>
Author
</AuthType>
<ClusterID>0</ClusterID>
</Author>
<Author>
<Name>Nicolas E</Name>
<AuthType>
Author
</AuthType>

67

Use History
curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?
db=pubmed&term=transposition+immunity&usehistory=y"

<eSearchResult>
<Count>94</Count>
<RetMax>20</RetMax>
<RetStart>0</RetStart>
<QueryKey>1</QueryKey>
<WebEnv>NCID_1_216310091_130.14.18.97_5555_1343867165_1026563511</WebEnv>
<IdList>
<Id>22624153</Id>
<Id>22555593</Id>
...

68

WebEnv and query_key


curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?
db=pubmed&version=2.0&query_key=1&
WebEnv=NCID_1_216310091_130.14.18.97_5555_1343867165_1026563511"

69

PERL Script
#!/usr/bin/perl
use LWP::Simple;
$dbase = shift or die "Must supply database on command line\n";
$query = shift or die "Must supply query on command line\n";
$base = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
$url = $base . "esearch.fcgi?db=$dbase&term=$query&retmax=0&usehistory=y";
$output = get($url);
$web = $1 if ($output =~ /<WebEnv>(\S+)<\/WebEnv>/);
$key = $1 if ($output =~ /<QueryKey>(\S+)<\/QueryKey>/);
$url = $base . "efetch.fcgi?db=$dbase&query_key=$key&WebEnv=$web";
$url .= "&rettype=fasta&retmode=text";
$data = get($url);
print "$data";
close (STDOUT);

./efaftch.pl nucleotide M65061+OR+U54469


70

ESearch -> XML


#!/usr/bin/perl
use LWP::Simple;
$dbase = shift or die "Must supply database on command line\n";
$query = shift or die "Must supply query on command line\n";
$days = shift or "";
$base = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
$url = $base . "esearch.fcgi?db=$dbase&term=$query&retmax=0&usehistory=y";
if ( $days ne "" ) {
$url .= "&reldate=$days";
}
$output = get($url);
print "$output";
close (STDOUT);

71

XML -> EFetch [1]


#!/usr/bin/perl
use LWP::Simple;
$dbase = shift or die "Must supply database on command line\n";
$type = shift or die "Must supply rettype on command line\n";
$base = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
while ($thisline = <STDIN>) {
$thisline =~ s/\r//;
$thisline =~ s/\n//;
$web = $1 if ($thisline =~ /<WebEnv>(\S+)<\/WebEnv>/);
$key = $1 if ($thisline =~ /<QueryKey>(\S+)<\/QueryKey>/);
$num = $1 if ($thisline =~ /<Count>(\S+)<\/Count>/);
}
...

72

XML -> EFetch [2]


...
$start = 0;
$chunk = 500;
while ( $num > 0 ) {
$url = $base . "efetch.fcgi?db=$dbase&query_key=$key&WebEnv=$web";
$url .= "&retstart=$start&retmax=$chunk&rettype=$type&retmode=text";
$data = get($url);
print "$data";
$start += $chunk;
$num -= $chunk;
sleep 1;
}
close (STDIN);
close (STDOUT);

./esrch.pl nucleotide 1322283 | ./eftch.pl nucleotide fasta


73

EBot

74

Text Query

75

Second Step

76

Output Format

77

Generate Script

78

EBot Result
DEFINITION
ACCESSION
VERSION
DBSOURCE
KEYWORDS
SOURCE
ORGANISM

REFERENCE
AUTHORS
...
FEATURES
source

alcohol dehydrogenase [Cyberlindnera jadinii].


BAM34535
BAM34535.1 GI:398298384
accession AB649224.1
.
Cyberlindnera jadinii
Cyberlindnera jadinii
Eukaryota; Fungi; Dikarya; Ascomycota; Saccharomycotina;
Saccharomycetes; Saccharomycetales; Phaffomycetaceae;
Cyberlindnera.
1
Tamakawa,H., Tomita,Y., Yokoyama,A., Konoeda,Y. and Yoshida,S.
Location/Qualifiers
1..348
/organism="Cyberlindnera jadinii"
/strain="NBRC0988"
/db_xref="taxon:4903"
/note="anamorph: Candida utilis"
1..348
/product="alcohol dehydrogenase"
1..348
/gene="ADH1"
/coded_by="AB649224.1:1..1047"

Protein
CDS
ORIGIN
1
61
121
181
241
301

msipktqkgv
lplvgghega
thdgsfqqya
gglgslaiqf
hgvinvsvse
dtreaidffe

ifyenggple
gvvvakgsev
tadavqaaki
akamglrvla
kaieqsteyv
rglvkapiki

ykdipvptpk
knfeigdyag
skgtdlaeia
idggddkkql
rncgtvvlvg
vglselpevy

pneilvnvky
ikwlngscms
pilcagvtvy
cqelgaevfi
lpagavaraq
klmeegkilg

sgvchtdlha
cefceksfea
kalktadlep
dftktkdivk
vfaavvksis
ryvvdtsk

wkgdwplpvk
ncpkadlsgy
gewvaisgag
siqdatnggp
vkgsyvgnra

//
LOCUS
DEFINITION
ACCESSION
VERSION
DBSOURCE
...

EJF61282
496 aa
linear
PLN 12-JUL-2012
alcohol dehydrogenase [Dichomitus squalens LYAD-421 SS1].
EJF61282
EJF61282.1 GI:395328892
accession JH719411.1

79

References
Entrez Programming Utilities Help
http://www.ncbi.nlm.nih.gov/books/NBK25501/

EBot
http://www.ncbi.nlm.nih.gov/Class/PowerTools/
eutils/ebot/ebot.cgi

MeSH Browser
http://www.nlm.nih.gov/mesh/MBrowser.html

80

Vous aimerez peut-être aussi