Vous êtes sur la page 1sur 31

Processing XML: XPath, XQuery

Ramakrishnan & Gehrke, Chapter 24 / 27

320302 Databases & Web Applications (P. Baumann)

Why are we DBers interested?


Its data, stupid. Thats us. Database issues:
How are we going to model XML?
Trees, graphs

How are we going to query XML?


XQuery

How are we going to store XML?


in a relational database? object-oriented? native?

How are we going to process XML efficiently?


many interesting research questions!

320302 Databases & Web Applications (P. Baumann)

XML Revisited
From a data modelling viewpoint, what does XML offer? Entities (ER!) Attributes
Single-valued, atomic

Relationships? Yes, but:


Single-root trees only Unordered, no role names General graphs through id/idrefs, syntax only

320302 Databases & Web Applications (P. Baumann)

Roadmap
XPath XQuery

320302 Databases & Web Applications (P. Baumann)

Path Expressions: XPath

path/ ls.com/x schoo www.w3

Basic concept: path = sequence of location steps


Axis: tree relationship between nodes selected by location step and current node
parent, child, self, descendant-or-self, attribute,

a node test: node type and expanded-name of nodes selected by location step zero or more predicates: further refine set of nodes selected by location step

General location step syntax: axisname::nodetest[predicate]

320302 Databases & Web Applications (P. Baumann)

Pattern Expressions
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> Locating Nodes in XML document <cd country="USA"> pattern expression to identify nodes in <title>Empire Burlesque</title> <artist>Bob Dylan</artist> document <price>10.90</price> </cd> path through the XML document: <cd country="UK"> .../node1/node2/... <title>Hide your heart</title> pattern "selects" elements that match path, <artist>Bonnie Tyler</artist> <price>9.90</price> result is a (sub)tree </cd> all price elements of all cd elements of the <cd country="USA"> catalog element: <title>Greatest Hits</title> <artist>Dolly Parton</artist> /catalog/cd/price <price>9.90</price> </cd> </catalog>
320302 Databases & Web Applications (P. Baumann) 6

Paths
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> Absolute vs. relative vs. fitting: <cd country="USA"> path starts with a slash ( / ): <title>Empire Burlesque</title> absolute path <artist>Bob Dylan</artist> <price>10.90</price> path starts with two slashes ( // ): </cd> all fitting elements, <cd country="UK"> even if at different levels in tree <title>Hide your heart</title> Otherwise: path relative to current position <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> Relative addressing via an axis: <cd country="USA"> Defines a node set relative to current node <title>Greatest Hits</title> <artist>Dolly Parton</artist> all children of parent, child, self, ancestor, <price>9.90</price> descendant, attribute, </cd> </catalog>
320302 Databases & Web Applications (P. Baumann) 7

Path Navigation Overview


Not shown:
Attributes Namespaces

320302 Databases & Web Applications (P. Baumann)

Examples

320302 Databases & Web Applications (P. Baumann)

10

Examples

320302 Databases & Web Applications (P. Baumann)

11

More Examples
self({2}) = {2} <1> <2> <3/> <4/> </2> <5/> <1/> child({1}) = {2,5} parent({3}) ={2} descendant({1}) = {2,3,4,5} descendant-or-self({1}) = {1,2,3,4,5} ancestor({4}) = {1,2} ancestor-or-self({4}) = {1,2,4} following({3}) = {4,5} preceding({4}) = {3} following-sibling({4}) = {} preceding-sibling({5}) = {2}

320302 Databases & Web Applications (P. Baumann)

12

Wildcards
Use * to select unknown elements all child elements of all cd of catalog: /catalog/cd/* all price elements that are grandchilds of catalog: /catalog/*/price all price elements which have 2 ancestors: /*/*/price all elements: //*
320302 Databases & Web Applications (P. Baumann)

<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd> </catalog>
13

Abbreviations
a/b/c
./child::a/child::b/child::c

a//@id
./child::a/descendant-or-self::node()/attribute::id

//a
root(.)/descendant-or-self::node()/child::a

a/text()
./child::a/child::text()

320302 Databases & Web Applications (P. Baumann)

14

Branch Selection
Selecting branches from subtree: "[...]" first cd child of catalog: /catalog/cd[1]

/catalog/cd[ position() = 1 ] <?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd> </catalog>
15

last cd child of catalog: /catalog/cd[ last() ]

Note: There is no function named first()

all cd elements of catalog that have a price element: /catalog/cd[ price ] all cd elements of catalog that have a price with value of 10.90: /catalog/cd[ price=10.90 ]
320302 Databases & Web Applications (P. Baumann)

Multiple Paths
Selecting Several Paths: | operator all title, artist elements: /catalog/cd/title | /catalog/cd/artist all the title and artist elements in the document: //title | //artist all title, artist, price elements: //title | //artist | //price all title elements of cd of catalog, and all artist elements: /catalog/cd/title | //artist
320302 Databases & Web Applications (P. Baumann)

<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd> </catalog>
16

Attributes
Selecting Attributes: prefix attributes with @ all attributes named country : //@country all cd elements which have an attribute named country: //cd[@country] all cd elements with attribute named country with value 'UK' ": //cd[@country='UK']
320302 Databases & Web Applications (P. Baumann)

<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd> </catalog>
17

Predicates
Predicates, operators, functions as usual all CDs with price below 10.0: /catalog/cd[ price<10.0 ] all CDs with country "UK" and price below 10.0: / catalog / cd[ @country="UK" ] / [ price<10.0 ]
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd> </catalog>
18

320302 Databases & Web Applications (P. Baumann)

Document Order

320302 Databases & Web Applications (P. Baumann)

19

Synopsis: Core XPath Syntax


xpath-expr ::= slash step finalstep axisname node-step node-final predicate

lete! ncomp i

step slash ... slash finalstep | slash step slash ... slash finalstep

::= / | // ::= [ axisname :: ] node-step [ [ predicate ] ] ::= [ axisname :: ] node-final [ [ predicate ] ]

Literals in red, all else to be substituted (ie, nonterminals)

::= child | descendant | parent | ancestor | ... ::= node-name | * ::= node-name | * | @ attr-name ::= some boolean expression over nodes and attributes
20

320302 Databases & Web Applications (P. Baumann)

Roadmap
XPath XQuery

320302 Databases & Web Applications (P. Baumann)

21

XQuery
XQuery retrieving information from XML data
XQuery = XML Query Built on XPath

XQuery is to XML what SQL is to tables Allows to extract information from XML structures
Stored in a file or in a database Major DBMS vendors support XQuery

See also www.w3c.org/XML/Query, www.w3schools.com (material borrowed)


320302 Databases & Web Applications (P. Baumann) 22

XQuery Introductory Example


Find all book titles published after 1995 FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 RETURN $x/title

Result: <title> abc </title> <title> def </title> <title> ghi </title>
320302 Databases & Web Applications (P. Baumann) 23

FOR and LET


FOR $x in expr
binds $x to each value in the list expr in turn Binds node variables
iteration FOR $x IN document("bib.xml")/bib/book RETURN <result> $x </result> Returns:
<result> <book>...</book> </result> <result> <book>...</book> </result> ...

LET $x = expr
binds $x to the entire list expr Defines variable; Binds collection variables
LET $x = document("bib.xml")/bib/book RETURN <result> $x </result>
320302 Databases & Web Applications (P. Baumann)

one value

Returns:
<result> <book>...</book> <book>...</book> ... </result>
24

Useful for common subexpressions and for aggregations

A More Complex Example


"For each author of a book by Morgan Kaufmann, list all books she published":
FOR $a IN distinct(document("bib.xml")/bib/book[publisher=Morgan Kaufmann]/author) RETURN <result> $a, <result> FOR $t IN /bib/book[author=$a]/title <author>Jones</author> RETURN $t <title> abc </title> </result> <title> def </title>
</result> <result> <author> Smith </author> <title> ghi </title> </result>

distinct = function that eliminates duplicates


320302 Databases & Web Applications (P. Baumann) 25

Aggregates
count = (aggregate) function that returns the number of elems
<big_publishers> FOR $p IN distinct(document("bib.xml")//publisher) LET $b = document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN $p </big_publishers>
<big_publishers> <publisher>Morgan Kaufmann</publisher> <publisher>Wiley</publisher> </ big_publishers>

How to obtain that?


<num_big_publishers>120</ num_big_publishers>
320302 Databases & Web Applications (P. Baumann) 26

Another Aggregate Example


Find books whose price is larger than average:
LET $a = avg(document("bib.xml")/bib/book/price) FOR $b IN document("bib.xml")/bib/book WHERE $b/price > $a RETURN $b
<book> abc </book> <book> def </book>

320302 Databases & Web Applications (P. Baumann)

27

Sorting
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) ORDERBY $p RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] ORDERBY $b/price DESCENDING RETURN <book> $b/title , $b/price </book> </publisher> </publisher_list>

320302 Databases & Web Applications (P. Baumann)

31

If-Then-Else

FOR $h IN //holding ORDERBY $h/title RETURN <holding> $h/title, IF $h/@type = "Journal" THEN $h/editor ELSE $h/author </holding>

<editor> abc </editor> <author> def </author>

320302 Databases & Web Applications (P. Baumann)

32

Quantifiers (SOME & EVERY)


FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN $b/title

FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN $b/title
320302 Databases & Web Applications (P. Baumann) 33

Summary: General Query Structure


FOR-LET-WHERE-ORDERBY-RETURN = FLWOR ("flower") XPath 2.0 supports FLOWR as well!
But not further "advanced"
stuff of XQuery XML doc FOR/LET Clauses List of tuples WHERE Clause List of tuples ORDERBY/RETURN Clause Instance of XQuery data model

320302 Databases & Web Applications (P. Baumann)

35

Summary: XML Family (Excerpt)


= "uses concepts of"

XSLT XML Schema XHTML SOAP Namespaces DTD DOM XML


XML 2nd edition
36

XQuery XPath XPointer XLink

320302 Databases & Web Applications (P. Baumann)

Vous aimerez peut-être aussi