Vous êtes sur la page 1sur 44

What is XML?

XML is a way of adding intelligence to your documents. It lets you identify each element using
meaningful tags and it lets you add information ("metatdata") about each element.
XML is very much a part of the future of Web and part of the future for all electronic information.
XML is a synta! for mar"ing up data and it wor"s with many other technologies to display and
process information. It loo"s and feels very much li"e #$ML.
XML isn%t going to replace everything else you%ve already learned& it complements it and e!tends
it.
XML isn%t going to change the way your Web pages loo". 'ou%ll still need to use ())**
Cascading Style Sheets** (with XML) to define font colors or +ava)cript (again with XML) to
ma"e your images fly around. 'et XML will change the way you and others read documents and
it will change the way documents are filed and stored. It%s a new technology and you certainly
don%t need to use it in order to build a great Web site ** but you will want to be aware of it as you
loo" at the Web of the future.
What's the Fuss About?
XML lets you ma"e documents smarter more portable and more powerful ** that%s the promise
of XML and that%s what all the fuss is about.
XML allows you to use your own tags to define parts of a document. 'ou can do this because
XML is a descriptive not a procedural language. $hat is XML describes what something is
rather than performing an action.
,or e!ample ta"e a loo" at the front page of a newspaper. 'ou%ll see different font si-es different
sections and columns.
If you were to create a Web page for that newspaper**using the same formatting and styles**you
would use tags such as <H1> and <font color="red"> to define the si-e and color of a
large headline or <i> to italici-e a word such as a byline in order to distinguish it from the rest
of the te!t.
.ut /ust try to write tags that actually e!plain that you%ve got a #eadline and that the words "+ohn
)mith" ma"e up a byline. #$ML won%t "now what you%re tal"ing about if you create tags such as
<Headline> or <byline> or <advertisement>.
XML with help from other technologies such as ()) understands what the elements are and
how to display them.
$hat means in the future when you%re searching on the web for say a .arbie doll for your
niece%s birthday you%ll get .arbie the 01LL instead of some other type of .arbie because the
.arbie doll page might be mar"ed up li"e this2
<DOLL>Barbie</DOLL>.
3retty cool huh4
XML documents can be moved to any format on any platform ** without the elements losing their
meaning. $hat means you can publish the same information to a web browser a 305 or a
networ"*enabled bread machine and each device would use the information appropriately.
$he most important thing to remember about XML though it that is doesn%t stand alone. It needs
other technologoies li"e ()) in for you to see its results.
If all of this seems li"e a pain and you don%t want to mess with XML it%s 16. 'ou don%t need it to
ma"e a great web page. .ut you never "now when organi-ation will come in handy.
Where Did XML Come From?
XML is a simplified version of )7ML and a cousin of #$ML. It was developed by members of the
W8( and released as a recommendation by the W8( in ,ebruary 9::;.
)7ML the parent of XML is an international standard that has been in use as a mar"up
language primarily for technical documentation and government applications since the early
1
9:;<s. It was developed to standardi-e the production process for large document sets. $hin"2
Medical records. (ompany databases. 5ircraft parts catalogs. 1ther really huge documents.
Mar"ing*up documents in )7ML allows information to be passed from one system to the ne!t
without losing information. With databases mar"ed*up in )7ML you can see what Widget 5 is all
about and go chec" to see if Widget 5 is in stoc".
=arly on people thought that )7ML would be useful for the Web. In fact #$ML is really an very
basic application of )7ML> .ut #$ML ?uic"ly became used for visual layout so a group of
people returned to the basics determined to create something that had the strengths of )7ML
without being so difficult to implement ** and had the ease of use of #$ML but with more
structural power. $he result was XML.
$he design goals of XML ta"en from the XML Specification are2
XML shall be straightforwardly usable over the Internet.
XML shall support a wide variety of applications.
XML shall be compatible with )7ML.
It shall be easy to write programs which process XML documents.
$he number of optional features in XML is to be "ept to the absolute minimum ideally -ero.
XML documents should be human*legible and reasonably clear.
$he XML design should be prepared ?uic"ly.
$he design of XML shall be formal and concise.
XML documents shall be easy to create.
$erseness in XML mar"up is of minimal importance.
In other words XML is easy to create easy to read and designed for use over the Internet. What
more could a Web designer as" for4
What Does XML Loo Lie?
If you%ve ever used #$ML XML is going to loo" very familiar>
When you view the source of a document written in XML the first thing you%ll see is the XML
declaration which loo"s li"e this2
<?xml version="1.0"?>
$hen in the body of the document you%ll see a lot of tags. $he tags loo" familiar at first ** they
start with the usual less than sign and end with the usual greater than sign li"e this2
<name>
.ut then you%ll notice that the tags might not be ?uite the names you%ve come to e!pect> 'ou%ll
see tags that seem to be made*up tag names. $ags li"e <dogchow> and <badcars> and
<species>. In fact if you view the source of an XML document you%ll see tags surrounding lots
of words maybe every word in the document. $hese tags define e!actly what the content is. 5nd
the creator of the document had the power to create his or her own specific set of tags.
)uppose you%re loo"ing at a Web page mar"ed up in XML on $he (anterbury $ales by (haucer.
'ou%re loo"ing specifically at lines @;@*@;A of "$he 3hysician%s $ale." $he document source for
that section might loo" li"e this2
<?xml version="1.0"?>
<CANTERBURY-TALES>
<SECTION name="physician">
The Physician's Tale
<LINE number="282">
2
That no man woot therof but God and he.
</LINE>
<LINE number="283">
For be he lewed man, or ellis lered,
</LINE>
<LINE number="284">
He noot how soone that he shal been afered.
</LINE>
<LINE number="285">
Therfore I rede yow this conseil take --
</LINE>
<LINE number="286">
Forsaketh synne, er synne yow forsake.
</LINE>
</SECTION>
</CANTERBURY-TALES>
$he tags simply define that2
9) $his document is the (anterbury $ales.
@) $his section is the 3hysician%s $ale.
8) =ach line of the 3hysician%s $ale is defined.
B) =ach line ends and the 3hysician%s $ale and $he (anterbury $ales end.
If the entire document were mar"ed up such as this you could easily /ump to a certain line or
section. $he entire document is annotated for easy reference and searching and instead of
viewing the entire document users could re?uest only specific sections of a document**simply by
calling the specific tags they want. 1h and we don%t recommend that you manually type out each
line in the (anterbury $ales. 7et a computer to count the lines for you.
XML !ersus "#ML
#$ML and XML are cousins. $hey draw off the same inspiration )7ML. $hey both identify
elements in your page. $hey both use a very simliar synta!. If you are famliar with #$ML XML
will also feel familiar.
$he big difference between #$ML and XML is that #M$L has evolved into a mar"up language
that describes the loo" feel and action of a Web page. 5n <H1> is a headline that is displayed in
a certain si-e for e!ample.
In contrast XML doesn%t describe how a page loo"s how it acts or what it does. XML describes
what the words in a document 5C=. $his is a critical distinction> While #$ML combines structure
and display XML separates them. $his means that XML documents are more portable and can
be used in many different types of applications.
In the near future we%ll see both XML and #$ML documents. =ventually XML will probably
replace #$ML or #$ML will become an application of XML. .ut that doesn%t mean you should
toss out everything you "now> In many ways XML builds on #$ML and if you "now #$ML XML
will be easier to wor" with.
3
!alid and Well$Formed XML
'ou%ll sometimes hear an XML document referred to as a "valid" XML document or a "well*
formed" XML document. $his distinction touches on one of the nice things about XML.
When you used )7ML you had to create something call a 0ocument $ype 0efinition (0$0 for
short) in order ma"e the )7ML document useful. 0$0s were fairly comple! and re?uired a lot of
wor" to create. $hey were one of the roadbloc"s to widespread use of )7ML.
With XML you have an option. 'ou can ma"e a well*formed XML document by simply following
the XML synta! rules. 'ou don%t have to create a separate 0$0 if you don%t want to.
If you do create an set of rules ** a 0$0 ** and ma"e your document conform to those rules it is
considered a valid XML document.
D#Ds describe the structure of your document. We%ll be discussing 0$0s in detail later on. Cight
now all you need to "now is that the main difference between Dalid and Well*,ormed XML is that
Dalid XML refers to and conforms to a 0$0 and Well*,ormed XML doesn%t.
Structure
XML applies structure to documents. 0ocuments are sets of related information.
$he term structure seems to bring some unpleasant imagery with it especially for creative souls
who want to ma"e this medium wor" in new and innovative ways. .ut when one is dealing with
publishing the term structure is ?uite positive. It is the way we put a s"eleton behind information
so that the pieces of information wor" together and ma"e sense as a whole.
$here are two "ey principles behind a structured model2
9. =ach part ** or element ** has a relationship with other elements. $his series of relationships
defines the structure.
@. $he meaning of the element is separate from its visual appearance.
Documents
We can%t really tal" about structure without first tal"ing a bit about documents. Document is
another of those terms that con/ures up somewhat negative images& one tends to picture "dusty
stac"s of documents" or "attorney%s documents" or "document processing." .ut in this case a
document is simply a collection related information.
,or e!ample this page is a document. 'our favorite %-ine is set of documents. 'our intranet is
probably comprised of hundreds if not thousands of documents.
)ometimes documents are created as a single unit. )ometimes they are built on demand pulling
pieces from a database and assembling into a document as the reader re?uests. In both cases
structure ma"es the document easier to create maintain and display.
Document Structure
$he document structure defines the elements which ma"e up a document the information you
want to collect about those elements and the relationship those elements have to each other.
'ou use XML to mar"up the document following the structure you have decided upon.
.y treating a document as a collection of elements you free it from the constraints of time place
and presentation format. 'ou can move the structured document from a word processor to a 305
to a web browser. $he structure is intact on each& you /ust alter the display characteristics for
each device.
$he document structure is called the document tree. $he main trun" of the tree is the parent. 5ll
the branches and leaves are children. 0ocument trees are usually visually represented as a
hierarchal chart.
4
Structure %s& Format
$he most important thing to remember about a structured document is that it is defined by the
elements it contains not by how it loo"s.
)tructure says that an element is a paragraph. ,ormat says to display the paragraph in 9@ point
$imes.
)tructure says the element is a boo" title. ,ormat says to display the boo" title in green bold body
te!t.
)tructure say the element is a social security number. ,ormat says to hide and not display the
social security number.
Learning to separate structure from format is critical in ma"ing good use of XML.
Metadata
Metadata is data about data. 5 "ey use of XML is to collect and wor" with metadata.
5t its most basic level XML is a metadata language. $hat is it is a way of assigning information
to pieces of data. $he most obvious use of this is to identify a piece of data as a certain structural
element. .ut this is /ust the beginning.
XML is about much more than mar"ing up documents for use in a web browser. XML is really
about adding layers of information to your data so that the data can be processed used and
transferred between applications.
Metadata in "#ML
If you%ve built a website you%ve almost certainly wor"ed with metadata. $he "eyword and
description meta tags are simple uses of metadata. With these meta tags you can assign the
document as a whole information about the general type of content it contains. $his information
doesn%t display in a web browser but it does display in search engine results.
5nother use of meta tags is to store information such as creator name and creation date. )ome
servers are structured to wor" with these meta tags allowing you to sort by creation date or
display based on creator name.
'oing Further (ith XML
XML ta"es this basic idea much further. With XML you can describe where you found your data
you can ?uantify ?ualify and further define it. 'ou can then use this metadata to validate
information perform searches set display constraints or process other data.
#ere%s /ust a few e!amples2
XML initiatives are under way which will allow for digital signature verification and validated
form submission. $his could ma"e it possible for forms with signatures to be submitted online
and be legally binding.
XML initiatives are under way to help catalog web content. Esing metadata the web can be be
inde!ed better and search more effectively.
XML is being used to transfer data based on factors /ust as date entered between unli"e
databases. $he metadata is both a means to find the correct data bits and a common
language of transfer between databases which do not spea" each other%s specific language.
#he )DF *roposal
1ne W8(*blessed use of metadata which you may have heard about is a proposal called the
Cesource 0escription ,ramewor" or )DF. C0, is an application of XML for ma"ing metadata
5
machine*processable. It allows applications to e!change information about data automatically.
$his has implications in inde!ing content rating intellectual ownership e*commerce and
privacy among other things. $he W8( says2
RDF with digital signature will be the key to building the "Web of
Trust" for electronic commerce, collaboration, and other
applications.
Display +ssues
XML alone will not display a page. 'ou must use a formatting technology such as ()) or X)L to display
XML*tagged documents in a Web browser.
XML is about separating structure and format. 5n XML document doesn%t "now anything about
how to display itself. It relies on other technologies for this.
5lthough XML does not deal with form it contains a great deal of information about the document
and its elements. $his when combined with style tools gives you a whole new strength and
fle!ibility in displaying your documents without having to maintain multiple copies of the
document.
XSL
=!tensible )tylesheet Language XSL is the future of XML display. It is an XML*based
languages for e!pressing stylesheets.
With X)L you can ma"e conte!t*sensitive display decisions. ,or e!ample you could
automatically display the document one way in a Web browser and another on a 305.
X)L can also transform XML into #$ML so that older browsers can view XML documents.
CSS
(ascading )tyle )heets CSS$, and CSS$- are the current way to display XML documents in a
Web browser. ()) is a means of assigning display values to page elements.
If you are going to be wor"ing with XML and you will be concerned with displaying pages learn
()). $he CSS )eference 'uide contains a guide to the ())*9 properties.
.eha%iors
.ehaviors are a non*standard I=F techni?ue that lets you do some interesting display actions
with XML tags. $hey combine scripting and ()) in a component file. $his component can be
attached to a particular tag and used in many different documents. $he .eha%iors Library
shows some of the things you can do with this techni?ue.
#he D/M
$he 0ocument 1b/ect Model lets you address change and manipulate any individual portion of the Web
page.
6
$he phrase "document ob/ect model" means that you treat your document as a collection of
individual ob/ects rather than a single solid unit. $he W8( 01M is the set of rules for doing this
in a standard way in a Web browser with #$ML and XML files.
/ is for /b0ect
In an ob/ect*oriented approach the program or the document is made up of many smaller
components called ob/ects. $he smaller components can be re*arranged added to or removed
dynamically.
$he idea of ob/ects has become ?uite popular in both software and documents. $he
programming language +ava and the scripting language +ava)cript each has an ob/ect*oriented
philosophy at its core. $he adoption of the standard 01M enables Web pages to share that
ob/ect approach too.
With an ob/ect model you manage the small pieces combining them and reusing them as it
ma"es sense ** instead of writing one huge applications program or one huge document. 'ou
might thin" of an ob/ect approach as being a little li"e a collection of Lego bloc"s ... different
pieces do different things but you can combine and recombine them into many different finished
pro/ects.
=ach ob/ect type acts a template. 'ou can use an instance of the same ob/ect over and over
again. ,or e!ample you might have multiple instances of the GcanineH element in a document.
5ll the ob/ects share the same name canine and wor" the same way but each one represents
its own set of data and can be addressed individually.
#he A*+
It isn%t enough to merely "now that an ob/ect is an ob/ect. 'ou also need to "now how to tal" to
that ob/ect and give it commands. $hat%s where the 53I comes in.
53I stands for Application *rogramming +nterface. 5n 53I is a set of rules that describes how
you can access and manipulate an ob/ect. $he 01M specification describes the 53I for #$ML
and XML documents.
$he 01M by providing a standard 53I defines the naming conventions programming models
and other rules for communicating with an ob/ect in an #$ML or XML page.
'etting from XML to /b0ects
In an XML document each element is actually an ob/ect ** it has a name and it has attributes that
describe it.
$he browser combined with a stylesheet displays each of the XML elementsIob/ects in a web
page. .ecause they are ob/ects you can address and change them individually.
5h but /ust "nowing that every piece is an ob/ect isn%t enough. 'ou need to have a set of rules
an 53I to describe how to address those ob/ects when they are placed in a web page. $hat%s
where the 01M comes in.
$he 01M does three things ** you might thin" of it as e!plaining the "who what and how" of the
web page.
9. ,irst it describes who ** which ob/ects are a web page and how XML ob/ects are represented
there4
@. )econd it defines what ** what can these ob/ects do and how do they wor" with others4
8. $hird it defines how ** how can these ob/ects can be addressed4
$he 01M is the translator the interface that lets all the pieces be represented properly tal" to
each other and communicate with scripts and other action tools.
It is XML that lets you add and identify data but it is the 01M that lets the script manipulate and
display that data on command in the web browser window.
7
*ulling +t All #ogether
'ou%ll typically be wor"ing with four technologies that combine to create an interactive Web page2 XML (or
#$ML) a scripting language ()) and the 01M. $his illustration shows their relationship.
XML identifies data& ,or e!ample2 "6ing Lear" is a title element.
CSS stores information about display %alues for elements and delivers the information to
the browser. ,or e!ample2 $itles are displayed in 9; point blac" courier type.
#he script 1tals1 to the ob0ects and sends messages to and from the bro(ser about the
ob0ects& $ypically these are "change your display" or "do this" messages based on user
actions or other variables. ,or e!ample2 If a particular title is out of stoc" display it in red.
#he D/M pro%ides the common interface through which various scripts and ob/ects tal" to
one another and display in the Web browser.
#he bro(ser displays the results to the end user.
If any of these pieces are missing you can%t create a dynamically*changing presentation of your
document.
2lement
5n element is the basic building bloc" of #$ML and XML documents.
=lements are identified by a tag. $he tag consists of angle brac"ets and content and loo"s li"e
this2
<!T"#R$Thadius %. Frog<&!T"#R$
In #$ML you use a pre*defined set of elements. In XML you create your own set of elements.
Attribute
8
Attributes are li"e ad/ectives in that they further describe elements. =ach attribute has a name
and a value.
5ttributes are entered as part of the tag li"e this2
<!T"#R dob="'()*"$Thadius %. Frog<&!T"#R$
#ag
'ou use a tag to identify a piece of data by element name.
$ags usually appear in pairs surrounding the data. $he opening tag contains the element name.
$he closing tag contains a slash and the element%s name li"e this2
<AUTHOR>Thadius %. Frog</AUTHOR>
Attribute !alue
5ttributes contain an attribute %alues. $he value might be a number a word or a ECL.
5ttribute values follow the attribute and an e?ual sign li"e this2
<!T"#R dob+"1874"$Thadius %. Frog<&!T"#R$
In XML attribute values are always surrounded by ?uotation mar"s.
Declaration
'ou begin an XML file with an XML declaration. $he declaration states that this is an XML file.
$he !ml declaration loo"s li"e this2
<?xml version="1.0"?>
D#D
0ocument $ype 0efintion. $he 0$0 defines the elements attributes and relationships between
elements for an XML document.
5 0$0 is a way to chec" that the document is structured correctly but you do not need to use
one in order to use XML.
#he XML Document
5n XML file is an 5)(II te!t file with XML mar"up tags. It has a .!ml e!tension li"e this2
booklist.,ml
+nside an XML File
5n XML file contains three basic parts2
9. 5 declaration that announces that this is an XML file&
@. 5n optional definition about the type of document it is and what 0$0 it follows&
8. (ontent mar"ed up with XML tags.
Clic on this paragraph to see a %ery simple e3ample of an XML document& Clic on an
part of the document to learn more about it&
#ypes of XML Documents
9
$here are two types of XML documents2 well*formed or valid. $he only difference between the
two is that one uses a 0$0 and the other doesn%t.
Well$formed
Well$formed documents conform with XML synta!. $hey contain te!t and XML tags. =verything
is entered correctly. $hey do not however refer to a 0$0.
!alid
!alid documents not only conform to XML synta! but they also are error chec"ed against a
0ocument $ype 0efinition (0$0). 5 0$0 is a set of rules outlining which tags are allowed what
values those tags may contain and how the tags relate to each other.
$ypically you%ll use a valid document when you have documents that re?uire error chec"ing that
use an enforced structure or are part of a company* or industry*wide environment in which many
documents need to follow the same guidelines.
D#Ds
5 0ocument $ype 0efinition (0$0) is a set of rules that defines the elements element attribute
and attribute values and the relationship between elements in a document.
When your XML document is processed it is compared to its associated 0$0 to be sure it is
structured correctly and all tags are used in the proper manner. $his comparison process is
called %alidation and is is performed by a tool called a parser.
Cemember you don%t need to have a 0$0 to create an XML document& you only need a 0$0 for
a valid XML document.
#ere%s a few reasons you%d want to use a 0$02
'our document is part of a larger document set and you want to ensure that the whole set
follows the same rules.
'our document must contain a specific set of data and you want to ensure that all re?uired
data has been included.
'our document is used across your industry and need to match other industry*specific
documents.
'ou want to be able to error chec" your document for accuracy of tag use.
Deciding on a D#D
Esing a 0$0 doesn%t necessarily mean you have to create one from scratch. $here are a number
of e!isting 0$0s with more being added everyday
Shared D#Ds
5s XML becomes wide*spread your industry association or company is li"ely to have one or
more published 0$0s that you can use and lin" to. $hese 0$0s define tags for elements that are
commonly used in your applications. 'ou don%t need to recreate these 0$0s ** you /ust point to
them in your doctype tag in your XML file and follow their rules when you create your XML
document.
)ome of these 0$0s may be public 0$0s li"e the #$ML 0$0. 1thers may belong to your
company. If you are interested in using a 0$0 as" around and see if there is a good match that
already e!ists.
Create 4our /(n D#D
10
5nother option is to create your own 0$0. $he 0$0 can be very simple and basic or it can be
large and comple!. $he 0$0 will be a reflection of the needs of your document.
It is perfectly acceptable to have a 0$0 with /ust four or five basic elements if that is what your
document needs. 0on%t feel that creating a 0$0 necessarily needs to be a huge underta"ing.
#owever if your documents are comple! do plan on setting aside time ** several days or several
wee"s ** to understand the document and the document elements and create a solid 0$0 that
will really wor" for you over time.
Mae an +nternal D#Ds
'ou can insert 0$0 data within your 01($'3= definition. If you%re wor"ed with ()) styles you
can thin" of this as being a little li"e putting style data into your file header. 0$0s inserted this
way are used in that specific XML document. $his might be the approach to ta"e if you want to
validate the use of a small number of tags in a single document or to ma"e elements that will be
used only for one document.
Cemember the primary use for a 0$0 is to validate that the tags you enter in your XML
document are entered as specified in the 0$0. It is an error*chec"ing process that ensures your
data conforms to a set a rules.
XML Synta3
$agging an XML document is in many ways similar to tagging an #$ML document. #ere are
some of the most important guidelines to follow.
)ule 5,6 )emember the XML declaration
$his declaration goes at the beginning of the file and alerts the browser or other processing tools
that this document contains XML tags. $he declaration loo"s li"e this2
<-,ml .ersion+"'./" standalone+"yes&no" encoding+"!TF0("-$
'ou can leave out the encoding attribute and the processor will use the E$,*; default.
)ule 5-6 Do (hat the D#D instructs
If you are creating a valid XML file one that is chec"ed against a 0$0 ma"e sure you 6now
what tags are part of the 0$0 and use them appropriately in your document. Enderstand what
each does and when to use it. 6now what the allowable values are for each. ,ollow those rules.
$he XML document will validate against the specified 0$0.
)ule 576 Watch your capitali8ation
XML is case*sensitive. G3H is not the same as GpH. .e consistent in how you define element
names. ,or e!ample use 5LL (53) or use Initial caps or use all lowercase. It is very easy to
create mis*matching case errors.
5lso ma"e sure starting and ending tags use matching capitali-ation too. If you start a
paragraph with the G3H tag you must end it with the GI3H tag not a GIpH.
)ule 596 :uote attribute %alues
In #$ML there is some confusion over when to enclose attribute values in ?uotes. In XML the
rule is simple2 enclose all attribute values in ?uotes li"e this2
<123 dob+"'45/"$6en %ohnson<&123$
11
)ule 5;6 Close all tags
In XML you must close all tags. $his means that paragraphs must have corresponding end
paragraph tags. 5nchor names must have corresponding anchor end tags. 5 strict interpretation
of #$ML says we should have been doing this all along but in reality most of us haven%t.
)ule 5<6 Close 2mpty tags= too
In #$ML empty tags such as <br> or <img> do not close. In XML empty tags do close. 'ou
can close them either by adding a separate close tag (GItagnameH) or by combining the open
and close tags into one tag. 'ou create the openIclose tag by adding a slash I to the end of the
tag li"e this2
<br&$
23amples
$his table shows some #$ML common tags and how they would be treated in XML.
Tag Comment End-Tag
G3H $echnically in #$ML you%re supposed to close this
tag. In XML it%s essential to close it.
</P>
G=L=M=J$H 5ll =lements in XML must have a )tart*tag and an
end*tag.
GI=L=M=J$H
GLIH $his tag must be closed in XML in order to ensure a
Well*,ormed XML document.
GILIH
GM=$5
nameK""eywords"
contentK"XML )7ML
#$ML"H
M=$5 tags are considered empty elements in XML
and they must close.
GM=$5 nameK""eywords"
contentK"XML )7ML
#$ML"IH
G.CH .rea" tags are considered empty elements. G.CIH
GIM7 srcK
"coolpictures.html"H
$his is an empty element tag. GIM7 srcK
"coolpictures.html"IH
(opyright L 9::;*::
Well$formed XML
5 document that conforms to the XML synta! rules is called "well*formed." If all your tags are
correctly formed and follow XML guidelines then your document is considered a well*formed
XML document. $hat%s one of the nice things about XML ** you don%t need to have a 0$0 in order
to use it.
.egin the Well$formed Document
$o begin a well*formed document type the XML declaration2
<-,ml .ersion+"'./" standalone+"yes" encoding+"!TF0("-$
If you are embedding XML it will go after the G#$MLH and G#=50H tags and before any
+avascript.
If you are creating an XML*only document it will be the first thing in the file.
12
!ersion
'ou must include the version attribute for the XML declaration. $he version is currently "9.<."
0efining the version lets the browser "now that the document that follows is an XML document
using XML 9.< structure and synta!.
Standalone
$he ne!t step is to declare that the document "stands alone." $he application that is processing
this document "nows that it doesn%t need to loo" for a D#D and validate the XML tags.
2ncoding
,inally declare the encoding of the document. In this case the encoding is E$,*; which is the
default encoding for XML. 'ou can leave off this attribute and the processor will default to E$,*;.
)emember the )oot 2lement
5fter the declaration enter the tag for the root element of your document. $his is the top*most
element under which all elements are grouped.
Follo( XML Synta3
Jow enter the rest of the your content. Cemember to follow XML synta!2
Cemember that capitali-ation matters&
Muote all attribute values&
(lose all tags&
Cemember to close empty tags too li"e this2
<br&$
3retty easy isn%t it4 $hat%s all there is to it>
!alid XML
5 valid document conforms to the XML synta! rules and follows the guidelines of a 0ocument
$ype 0efinition (0$0).
$he process of comparing the XML document to the 0$0 is called %alidation. $his process is
performed by a tool called a parser.
.egin the !alid XML Document
$o begin a well*formed document type the XML declaration2
<-,ml .ersion+"'./" standalone+"no" encode+"!TF0("-$
If you are embedding XML it will go after the G#$MLH and G#=50H tags and before any
+avascript.
If you are creating an XML*only document it will be the first thing in the file.
!ersion
'ou must include the version attribute for the XML declaration. $he version is currently "9.<."
0efining the version lets the browser "now that the document that follows is an XML document
using XML 9.< structure and synta!.
Standalone
$he standalone="no" attribute tells the computer that it must loo" for a 0$0 and validate the XML
tags.
2ncoding
,inally declare the encoding of the document. 'ou can leave off this attribute and the processor
will default to E$,*;.
13
Create a D/C#4*2 Definition
$he second element in a valid XML document is the 01($'3= definition. $his identifies the type
of document and 0$0 in use.
If you loo" at #$ML source files you%ll often see a >01($'3= definition especially if the file was
created by a W')IW'7 tool. $he 01($'3= definition points to an #$ML 0$0.
In a valid XML file >01($'3= tells the program that is processing your XML file two things2 the
name of the type of document and the name and location of the 0$0 against which to validate
the file%s contents.
$he 01($'3= definition loo"s li"e this2
<7D#8T9:3 type0of0doc ;9;T32&:!6<=8 "dtd0name"$
>D/C#4*2
$his says that you are defining the 01($'3=.
type$of$doc
$his is the name of the type of document contained in this file. $ypically this is the same name
as the 0$0.
S4S#2M?*@.L+C
)')$=M tells the processor to loo" for the private 0$0 at the following location. 3E.LI( tells the
processor to loo" for a public 0$0 at the following location.
1dtd$name1
$he ECL after )')$=M or 3E.LI( is the name of the dtd file. 5ll 0$0s end with the e!tension
.dtd.
If you want instead of pointing to an e!ternal 0$0 you could place the 0$0 information within
the 01($'3= definition ma"ing it local to your XML document. 'ou should do this only if you
want to define a few simple elements and you want them permanently attached to a particular
document.
)emember the )oot 2lement
5fter the declaration enter the tag for the root element of your document. $his is the top*most
element under which all elements are grouped.
Follo( XML Synta3
Jow enter the rest of the your content. Cemember to follow XML synta!2
Cemember that capitali-ation matters&
Muote all attribute values&
(lose all tags&
Cemember to close empty tags too li"e this2
<br&$
2lements
=lements are the basic building bloc"s of XML (and #$ML for that matter). =ach element is a
piece of data identified by a tag. $he tag contains the name of the element and any of its
attributes li"e this2
<!T"#R dob+"'(5*"$Thadius %. Frog<&!T"#R$
$hadius +. ,rog is now identified as an author element. $his particular author element as a date
of birth (dob) attribute value of 9;AB.
Chose 4our /(n
14
XML is an e3tensible mar"up language. $his means you create a set of elements that wor" for
your content ** and that you%ll be able to use consistently within the document.
Whether you use a 0$0 or not you%ll still want to sit down and write a list of the element names
that you will be using in your document. XML is case*sensitive so as you%re thin"ing about the
element names be sure the thin" about how you capitali-e them also.
)elect names that are both easy to rememberer and easy to type. Ideally your tags should have
some inherent meaning too. $his ma"es them easier to use. ,or e!ample if you want to identify
"last name" as an element consider naming the element something li"e "last*name" or
"surname."
.e consistent in your use of names. It is easier to apply one set of general rules to @< different
tags than it is to remember eight discrete tags that follow no particular pattern. ,or e!ample if
your document is a listing of classes you could use these elements2
<list0of0classes$
<name0class$
<instructor0name$
<;ec$
<T=23$
<descprt$
.ut you%re /ust as"ing for confusion>
$here%s a mi! of capitali-ation. $here%s a mi! of abbreviation and full words. In one case the
phrase "name" is the first part of tag& in the other it is the second part of the tag. It isn%t logical to
remember this set of names.
Wouldn%t names li"e this be easier to use4
<classlist$
<class$
<section$
<time$
<instructor$
<description$
$heses names are all lowercase full words no plurals ** and easy set of criteria to remember.
Focus on Structure= Aot Format
1ne of the goals of using XML is to separate structure ("this is an author") from format ("display
this in 9< point #elvetica"). =lements remain identified as elements no matter what platform you
move the data to. 5n XML document is completely interpretable.
When you thin" about elements thin" about the role they play and the data they contain. 0on%t
thin" about how the elements will loo" on the page. 5ppearance is handled separately.
'ou are using elements to identify data within your document as playing a certain role or
belonging to a certain category of data.
Displaying 2lements
'ou can use any tag name you want as long as you follow proper XML synta!. 1f course those
tags alone won%t do anything. $hey will /ust sit there ?uietly mar"ing up your data.
5fter you data is mar"ed up you%ll use style sheets or other processing tools to display the XML
document. 'ou can control the display based on information contained in the elements.
@sing 2lements
In a well*formed XML document you can insert any element tag you want as long as you follow
proper synta!.
In a valid XML document only the elements which are specified in the 0$0 will pass muster. If
you randomly add other elements their use will be flagged as an error.
When you use elements in an XML document you must follow standard XML synta!2
15
$he element name surrounds the data which it defines. ,or e!ample2 <chapter0
head$Tying >nots<&chapter0head$.
5ll elements including empty elements must end. $his means having an open and close tag
for regular elements and a tag that closes with a slash for empty elements.
$he element name is case sensitive2 <!T"#R$, is not the same as <author$.
D#Ds and 2lements
1ne of the ways to define and codify all your elements is to create a 0$0. 5 0$0 defines the
allowable elements their attributes (if any) are and their relationship is to other elements.
.y validating your XML document against a 0$0 you can test to be sure that elements in the
documents are being used correctly.
Attributes
5ttributes provide additional information about elements.
'ou use elements and attributes all the time in #$ML. ,or e!ample in #$ML a tag such as <"'
align+"center"$ includes an element2 "' and an attribute2 align and an attribute value2
center.
In #$ML attributes allow you to specify additional information about your elements. 1ften this
information is formatting*related such as align or si-e. In XML attributes allow you to specify
additional data about an element but it is never formatting*related. It is instead additional data
about that particular element.
Let%s say for e!ample you%re creating documents about late @<th century popular music. In your
0$0 you%ve created an element called <;#1?$ which identifies each musical title. 'ou have
music that falls into different decade categories ** the N<%s the ;<%s and the :<%s. 'ou can give
the song element an attribute called era. Jow you%ll be able to "now from what era each song
dates.
.y using an attribute you can identify different versions of the same song ** "I%ve 7ot 'ou .abe"
from the 9:A<s and "I%ve 7ot 'ou .abe" from the 9:;<s. Later on you can use this data to
display all N<s songs in green or to sort the displayed titles by era.
'ou would use the attribute li"e this2
<;#1? era+"5/s"$=@.e ?ot 9ou 6abe<&;#1?$
<;#1? era+")/s"$6illy Don@t 6e a "ero<&;#1?$
<;#1? era+"(/s"$=@.e ?ot 9ou 6abe<&;#1?$
"I%ve 7ot 'ou .abe" is identified as a "song" element with an "era" attribute value of "A<s". ".illy
0on%t .e 5 #ero" is identified as a "song" element with an "era" attribute value of "N<s". "I%ve 7ot
'ou .abe" is identified as a "song" element with an "era" attribute value of ";<s".
5ttributes and their allowable values are created in your 0$0 when you specify elements. $hey
are specified through an attribute list. Li"e element names attribute names are case*sensitive
so be aware of your use of capitali-ation when you select and use attribute names.
1ne other important thing to remember about attributes in XML tags is that the attribute values
must always be contained inside ?uotes. In #$ML it%s a mi!ed bag but in XML the rule is easy to
remember2 ?uote all attribute values.
Comments
(omments are a way to add your own notes to an XML document. $he browser and the XML
processors will ignore anything inside comments.
'ou aren%t going to remember what you were thin"ing three months later when you return to edit
the document so don%t be afraid to add comments as reminders or as mar"ers of wor" that you
have done.
16
$o create a comment2
9. $ype a less than sign followed by an e!clamation point and two dashes li"e this2
<700
@. $ype the te!t you want inside the comment. .e sure the te!t 01=) J1$ contain two dashes>
<700Tis de!ines " lis#in$ o! boo%s
8. Jow close the comments with two dashes and a closing greater than tag2
<700This defines a listing of books&&>
CDA#A
(05$5 stands for "character data." (haracter data are letters numbers and other symbols that
are used e!actly as they are typed. $hey are not parsed or processed or treated as if they have
any special meaning.
'ou can create a (05$5 section within your XML document. 5 (05$5 section is handy way to
show code e!amples or to use characters such as H that would otherwise ta"e on a special
meaning. 'ou can use (05$5 instead of using a series of Olt& for e!ample.
$o create a (05$5 section2
9. 5t the place in the document where you want the (05$5 section to appear begin a (05$5
definition with the less than sign and an e!clamation point.
<7
@. $ype an open s?uare brace and the letters (05$5.
<7'()ATA
8. $ype another open s?uare brace.
<7A8DT'
B. Jow type the (05$5 itself. In this e!ample we are typing some sample code.
<7A8DTA<*A+, -ommon="!redd." breed"s/rin$er&s/"niel">0ir 1redri-%
o! 2ed."rd3s ,nd</*A+,>
F. =nd the section with two closing s?uare brac"et and a greater than symbol.
<7A8DTA<123 common+"freddy" breed"springer0spaniel"$;ir Fredrick
of <edyard@s 3nd<&123$44>
(lic" anywhere on this code to see how it would be displayed in a browser assuming of course
that it is lin"ed to a stylesheet2
<H,A)1>
,n#erin$ " 5ennel (l6b +ember
</H,A)1>
<),0(R78T7O*>
,n#er #e member b. #e n"me on is or er /"/ers. Use #e *A+, #"$. Te *A+,
17
#"$ "s #9o "##rib6#es. (ommon :"ll in lo9er-"se; /le"se<= is #e do$3s -"ll
n"me. >reed :"lso in "ll lo9er-"se= is #e do$3s breed. 8le"se see #e breed
re!eren-e $6ide !or "--e/#"ble breeds. ?o6r en#r. so6ld loo% some#in$ li%e
#is@
</),0(R78T7O*>
<,AA+82,>
<<'()ATA'<*A+, -ommon="!redd." breed"=s/rin$er&s/"niel">0ir 1redri-% o!
2ed."rd3s ,nd</*A+,>44>
</,AA+82,>
Aamespaces
Jamespaces are a way of using elements from more than one 0$0 within the same XML
document.
)ometimes you may be wor"ing with material that draws on several sets of element tags. ,or
e!ample you might have an online store selling tropical fish and you%d li"e to use the
G)1EC(=H tag to identify both the geographic location from which each species comes and the
wholesaler from whom you buy it. Jamespaces are a way to do this.
5n XML namespace is a collection of names identified by a ECI reference which are used in
XML documents as element types and attribute names. In practice namespaces let you match a
tag you are using with a particular set of tags.
In the beginning of your document (or at the start of a particular element of your document) you
identify the namespaces you%ll be using and where the tag information is located. $hen when you
use the tag to identify an element in your document you precede it with the appropriate
namespace name.
Declaring Aamespaces
5t the beginning of your document you%ll want to identify the namespaces you are using in your
document. $his process is called declaring the namespace. In this e!ample you are creating a
namespace called "sales." $he ECI for sales is the mythical fishworld.orgIschema2
<document ,mlnsB;<3;+@httpB&&fishworld.org&schema@$
@sing Aamespaces
When you use the tag to create the element that is defined in one of the namespaces the
namespace is the first part of the tag li"e this2
<;<3;B ;#!R83$Fish0o0Rama Wholesalers and ;uppliers to the Trade<&;#!R83$
When you use your own tag you /ust use the tag name li"e this2
<;#!R83$2e,ico, 8entral merica<&;#!R83$
In +anuary 9::: Jamespaces became a W7C )ecommendation.
XML 2ntities
5n entity is a short cut to a set of information.
When you use an entity it "e!pands" to its full meaning but you need only type the shorter entity
name during data entry. 'ou might thin" of an entity as being a bit li"e a macro ** it is a set of
information that can be used by calling one name.
XML defines two types of entities. $he general entity which we%ll tal" about here is used in XML
document. $he parameter entity is used in 0$0s. 7eneral entities are easy to spot2 they begin
with the ampersand and end with the semicolon li"e this2
Centity0nameD
18
@ses for 2ntities
=ntities are a way to ma"e entering and managing data easier.
'ou%ve probably already used entities without calling them that. If you%ve ever entered the
characters Olt& to create the G symbol you%ve used an entity. $his "eystro"e combination is a
standard predefined entity in both #$ML and XML that lets you access a particular ascii
character without having to memori-e the character set number.
#ere are a few reasons you might want to define and use entities2
2ntities sa%e typing& )uppose you have a paragraph li"e a copyright notice that you use in
every single document. 'ou could type that notice over and over again. 1r you could use an
entity to call it forth in place.
2ntities can reduce errors. .y the 9<9st time you type that copyright notice it is li"ely your
poor fingers will be so tired you%ll ma"e an error and set your copyright for 9:;: instead of
9:::. Esing an entity can reduce the potential for these types of errors.
2ntities are easy to update& It is time to update that copyright notice ** with an entity you can
ma"e the change in one place and be done with it. Without an entity you%d be searching and
replacing throughout your document set.
2ntities can act as placeholders for #.D information& Maybe legal hasn%t ?uite finali-ed
what they want that copyright notice to say. $hat doesn%t have to stop production ** you can
use and entity and when the final wording comes down the entity will automatically display the
new corrected version in all your documents.
'ou can get ?uite creative with the use of entities and even have documents that are
constructed entirely from entities. #ere%s an e!ample2
'ou want to create different documents each contains a set of bios for members of your staff.
'ou%ll have an e!ecutive set a set for each product line a set for si! different regions around the
world ... subsets of the same content appears in each.
1ne approach you could ta"e is creating 9< or 9@ separate flat files with the appropriate
biography information into each. .ut an easier way is to create a small file for each bio then call
each into the e!ecutive page the =uropean page the ,lying $oys 0ivision page and so on via an
entity.
#ere%s how the content code for your ,lying $oys 0ivision 3age might loo". Epon display the
entities would e!pand and you%d see the full bios of each person. If you needed to change the
bios you could do it in one place. If the product manager changed all your pages would be
automatically updated with the new person.
(lic" anywhere in the code to see how it might e!pand into a displayed document2
<H,A)>Te 1"-es >eind 1l.in$ To.s<</H,A)>
<>7O>Bbio&!#&div&e"dC</>7O>
<>7O>Bbio&!#&/rod&m$rC</>7O>
<>7O>Bbio&!#&desi$nerC</>7O>
<>7O>Bbio&!#&le"d&en$ineerC</>7O>
Defining 2ntities
'ou can define entities in your local document as part of the 01($'3= definition. 'ou can also
lin" to e!ternal files that contain the entity data. $his too is done through the 01($'3=
definition. 5 third option is to define the entities in your e!ternal 0$0.
Ese a local definition when the entity is being used only in this one particulars file. Ese a lin"ed
e!ternal file when the entity being used in many document sets.
$o define an entity2
9. )tart your 01($'3= definition as usual li"e this2
<7D#8T9:3
@. Jow mar" that you are defining some data by entering a s?uare brac"et2
19
<7D#8T9:3 A
8. )tart the entity definition with a less than sign an e!clamation mar" and the phrase =J$I$'
all in caps2
<7D#8T9:3 A
<731T=T9
B. $ype the name of the entity. $ype it using the capitali-ation that you will use when calling it
later on.
<7D#8T9:3 A
<731T=T9 copyright
F. If you are defining the entity locally type the value of the entity surrounded by ?uotes and
then close the entity definition with a greater than sign.
<7D#8T9:3 A
<731T=T9 copyright "8opyright E///, s The World ;pins 8orp. ll
rights reser.ed. :lease do not copy or use without authoriFation. For
authoriFation contact legalGworldspins.com."$
A. If you are defining an entity in an e!ternal ascii te!t file put in a pointer to the e!ternal file
then close the entity definition with a greater than sign.
<7D#8T9:3 A
<731T=T9 copyright ;9;T32
"httpB&&www.worldspins.com&legal&copyright.,ml"$
N. (reate all your entity definitions. When you are done close the 01($'3= definition with a
s?uare brace and a greater than sign.
<7D#8T9:3 A
<731T=T9 copyright "8opyright E///, s The World ;pins 8orp. ll
rights reser.ed. :lease do not copy or use without authoriFation. For
authoriFation contact legalGworldspins.com."$
<731T=T9 trademark ;9;T32
"httpB&&www.worldspins.com&legal&trademark.,ml"$
H
$
@sing 2ntities
$o use an entity in your document /ust call it by name. $he name begins with an O and ends with
a semi*colon.
(lic" anywhere on this code to see how it would display assuming of course that it was lin"ed to
a style sheet.
<?xml version="1.0">
<<)O(T?8, '
<<,*T7T? -o/.ri$# "(o/.ri$# D000; As Te Eorld 0/ins (or/. All ri$#s
reserved. 8le"se do no# -o/. or 6se 9i#o6# "6#oriF"#ion. 1or "6#oriF"#ion
-on#"-# le$"lG9orlds/ins.-om.">
<<,*T7T? #r"dem"r% 0?0T,+ "##/@//999.9orlds/ins.-om/le$"l/#r"dem"r%.xml">
4
>
20
<8R,00R,2,A0,>
<H,A)>+ini&$lobe revol6#ioniFes %e.-"in ind6s#r.</H,A)>
<2,A)>
Tod". As Te Eorld 0/ins in#rod6-es " ne9 "//ro"- #o %e. -"ins. Ei# #e ne9
+7*7&H2O>, %e.s -"n be %e/# inside " -"in; -"lled !or 6/on dem"nd; "nd s#ored
s"!el.. *ever more 9ill -ons6mers lose " %e. or s#"nd "# " door !li//in$
#ro6$ " s#"-% o! %e.s see%in$ #e ri$# one.
</2,A)>
<2,HA2>
B#r"dem"r%C
B-o/.ri$#C
</2,HA2>
</8R,00R,2,A0,>
XML D#Ds6 +ntroduction
Dalid XML documents follow a set of rules defined in a associated 0$0. $his 0ocument $ype 0efinition
defines elements attributes and relationships between elements.
0$0s are saved in an ascii te!t file with the e!tension .dtd li"e this2
mypage.dtd
When your XML document is processed it is compared to its associated 0$0 to be sure it is
structured correctly and all tags are used in the proper manner. $his comparison process is
called %alidation and is is performed by a tool called a parser.
Cemember you don%t need to have a 0$0 to create an XML document& you only need a 0$0 for
a valid XML document.
.efore 4ou .egin
$here are a handful of terms you%ll be hearing as you wor" with an XML 0$0. $a"e a couple of
minutes to become familiar with them before you begin. (lic" on any of the terms to see its
definition.
Schema
5 schema is a description of the rules for data.
5 schema does things2
9. It defines the elements in a data set and their relationship to each other.
@. It defines the content that can be contained in each element.
0$0s are a schema for XML documents.
D#D
0ocument $ype 0efintion. $he 0$0 defines the elements attributes and relationships between
elements for an XML document.
5 0$0 is a way to chec" that the document is structured correctly but you do not need to use
one in order to use XML.
Document #ree
21
5 document tree is the representation of the hierarchy of elements in a document.
5 document tree has one root element. 5ll other elements are part of this top*level element. $he
first tag in your XML document is always the root element.
)oot 2lement
$he root element is the top*most element in the hierachy. 5ll other elements in a document are
children of this element.
In an XML file the first tag is the root element%s tag.
In the 0$0 the root element is the first element you should define.
*arent 2lement
5 parent element is a element which contains other elements. $he other elements are called
children.
,or e!ample a list is a parent. $he list items are children.
5 parent element is sometimes referred to as a branch element. =ach branch sprouts off the
tree& from the branch hang other brances and individual leaves. $he branches and leaves
"belong" to the parent branch.
Child 2lement
$he child element a sub*set of the parent element.
5n element may be both a parent and a child at the same time. ,or e!ample the list element
may be a child of the root element. 5t the same time it is the parent of the list item element.
If a child element is the outer*most element in the hierachy and does not contain any other
elements it is sometimes called a leaf element.
*arser
5 parser is a software tool that chec"s to be sure a document follows a particular synta!.
XML parsers come in two varieties2
5 non$%alidating parser chec"s a document to be sure XML synta! rules are followed and
builds a document tree from the element tags.
5 %alidating parser chec"s the synta! builds the tree and compares the use of element tags
to be sure they conform with the rules specified in the document%s associated 0$0.
3aresers can be either e!ternal programs or part of the editing tool or browsing tool.
$he XML Ceference section includes a list of some of the XML parsers
D#D Contents
5 0$0 is a way to ensure that an XML document uses elements correctly. It contains a set of
rules. When your XML document is processed it is compared to its associated 0$0 to be sure it
is structured correctly and all tags are used in the proper manner.
5 0$02
5lways contains rules that define elements.
5lways contains rules that define the relationship between elements.
May contain rules that define attributes for elements althought not all elements have
attributes.
May contain rules that define entities.
May may contain rules that define notations
22
Finding a D#D
Esing a 0$0 doesn%t necessarily mean you have to create one from scratch. $here are a number
of e!isting 0$0s with more being added everyday.
Shared D#Ds
5s XML becomes wide*spread your industry association or company is li"ely to have one or
more published 0$0s that you can use and lin" to. $hese 0$0s define tags for elements that are
commonly used in your applications. 'ou don%t need to recreate these 0$0s ** you /ust point to
them in your doctype tag in your XML file and follow their rules when you create your XML
document.
)ome of these 0$0s may be public 0$0s li"e the #$ML 0$0. 1thers may belong to your
company. If you are interested in using a 0$0 as" around and see if there is a good match that
already e!ists.
Create 4our /(n 23ternal D#D
5nother option is to create your own 0$0. $he 0$0 can be very simple and basic or it can be
large and comple!. $he 0$0 will be a reflection of the needs of your document.
It is perfectly acceptable to have a 0$0 with /ust four or five basic elements if that is what your
document needs. 0on%t feel that creating a 0$0 necessarily needs to be a huge underta"ing.
#owever if your documents are comple! do plan on setting aside time ** several days or several
wee"s ** to understand the document and the document elements and create a solid 0$0 that
will really wor" for you over time. Cemember you%ll be able to use this 0$0 with many individual
documents so it is worth the time to thin" it through and craft it well.
Create 4our /(n +nternal D#Ds
'ou can insert 0$0 data within your 01($'3= definition in an individual XML document. If
you%re wor"ed with ()) styles you can thin" of this as being a little li"e putting style data into
your file header.
0$0s inserted this way are used in that specific XML document only. $his might be the approach
to ta"e if you want to validate the use of a small number of tags in a single document or to ma"e
elements that will be used only for one document.
+nternal D#Ds
'ou can insert 0$0 data within your doctype declaration. $his type of 0$0 is used only by the
one specific XML document that contains it.
$his is a very simple e!ample of 0$0 data within the doctype declaration. (lic" on any line of the
code to learn what it does.
<<)O(T?8, boo%s '
<<,2,+,*T #i#le :I8()ATA=>
<<,2,+,*T "6#or :I8()ATA=>
<<,*T7T? -o/.ri$# "(o/.ri$# 1JJJ; 1l.in$ To.s 7n-.; "ll ri$#s reserved.">
4>
23ternal D#Ds
0$0s are stored as ascii te!t files with the e!tenstion .dtd. =ach file begins with a 01($'3=
definition and includes a seres of element definitions attribute lists entity defintions and notation
23
definitions. #ere%s an e!ample& this might be the 0$0 for a set of documents about boo"s. (lic"
on any line for more information about it2
<<&&Tis de!ines " lis#in$ o! boo%s&&>
<<)O(T?8, boo%lis# '
<<,2,+,*T boo%lis# :#i#le; "6#or=>
<<,2,+,*T #i#le :I8()ATA=>
<<,2,+,*T "6#or :I8()ATA=>
<<ATT270T #i#le :/"/erK-lo#K"rd= "/"/er">
<<,*T7T? -o/.ri$# "(o/.ri$# 1JJJ; 1l.in$ To.s 7n-.; "ll ri$#s reserved.">
4
>
0$0s can be much more comple! than this e!ample ** and they typically are ** but this gives you
a sense of what they can do. It%s /ust a matter of structuring your data and figuring out the "parts"
of your content.
)eading a D#D
=ven if you don%t plan to build a 0$0 from scratch it is helpful to "now how to read one and to
understand the document it is describing.
,rom reading a 0$0 you should be able to compile a list of elements and their attribute and how
and when to use them. 'ou should also be able to compile a list of entities that you can use
within the document.
)ome people find it helpful to actually s"etch out a document tree as they go through the 0$0 to
visuali-e the structure of the document.
Chec List
#ere%s a list of things to loo" for as you go through a 0$02
)ead the Comments
Aote the .asic 2lements
)ead the 2lement Declaration
Loo for *arent?Child )elationships
)ead Attribute Lists
Find Attribute Aames for 2ach 2lement
Determine Attribute !alue #ypes
See the Attribute's Default
)ead 2ntity Declarations
)ead the Comments
Cead the comments> (omments can tell you a lot about the 0$0 how to use it and what to be
aware of when using it.
Most 0$0 authors will include information that you should "now before using the 0$0. $his might
range from use restrictions to how*to information.
(omments loo" li"e this2
<700 "ere@s a comment 00$
Aote the .asic 2lements
Loo" through the 0$0 and identify the element names that comprise the document. Jote how
they are capitali-ed. 'ou might want to develop a reference sheet of elements that you can
ma"e notes on as you wor" your way through the 0$0.
=lements begin li"e this2
<73<3231T
$he te!t immediately after the element declaration is the element%s name.
24
)ead the 2lement Declaration
=ach element declaration provides the name of the element and the content which it contains.
)ometimes the content is te!t. 1ther times is other elements arranged in a certain order or used
a certain number of times.
(lic" on each portion of these element declarations to learn about the rules they describe.
<<,2,+,*T ,+82O?,, :17R0T; +7; 2A0T=>
<<,2,+,*T 17R0T :I8()ATA=>
<<,2,+,*T +7 :I8()ATA=>
<<,2,+,*T 2A0T :I8()ATA=>
Loo for *arent?Child )elationships
$he element rules build a hierarchy of element describing how one element is related to another.
5nd element that is contained within another is considered a child of the element in which it is
contained. Ese these relationships to s"etch out your document tree.
$he parentIchild relationship is defined in the content type portion of the element definition. If the
content type is another element then those elements are children of the element whose definition
you are reading. ,or e!ample2 ,IC)$ MI and L5)$ are children of =M3L1'==2
<73<3231T 32:<#933 IF=R;T, 2=, <;TJ$
$he 0$0 can re?uire that the child elements be used in a certain order or that they be used
one none or many times. It can also group elements to create more detailed rules.
)ead Attribute Lists
5fter element definitions you may see attachment lists. 5n attachment list begins li"e this2
<7TT<=;T
=ach attribute list defines the attributes for an element. Many attributes may be defined in one
5$$LI)$.
$he 5$$LI)$ is structure li"e this2
<7TT<=;T element0name attribute0name attribute0type default0data$
See Which 2lement the Attribute Defines
Cight after the 5$$LI)$ declaration is the name of an element. $his is the element that the
attribute list defines. ,or e!ample this 5$$LI)$ defines the (1MM=J$ element2
<7TT<=;T 8#2231T attribute0name attribute0type default0data$
Find Attribute Aames for 2ach 2lement
,ollowing the element name is the name of the first attribute declared in this list. $his name is the
attribute name you type into the element tag in the XML file. ,or e!ample this 5$$LI)$ defines
the attribute "category" for the element (1MM=J$.
<7TT<=;T 8#2231T category attribute0type default0data$
5dd the attribute information to the element reference list you are building.
Determine Attribute !alue #ypes
5ttributes can be one of several different types. $he attribute*type describes the type of %alue
that the attribute may contain. ,or e!ample this 5$$LI)$ says that the "category" attribute for
the element (1MM=J$ contains one of four values2 red green blue or other.
<7TT<=;T 8#2231T category Ired K greenK blueK otherJ default0data$
See the Attribute's Default
25
$he final part of the 5$$LI)$ is the default value of the attribute. $he default %alue has a strong
effect on how the attribute is used and what values it might have if you don%t use it in the XML
tag. 'ou can ma"e the value re?uired (PC=MEIC=0) or optional (PIM3LI=0). 1r you can
provide a default value that will be used automatically if the attribute is not entered.
)ead 2ntity Declarations
5long with element and attribute definitions you may also see entity definitions. $ypically these
will appear in a group often at the beginning of the 0$0 and usually with e!planatory comments.
5n entity definition begins li"e this2
<731T=T9
5fter the declaration is the entity%s name and the contents of the entity. $he contents may be te!t
or it may be a pointer to another e!ternal file. ,or e!ample this defines two entities one called
"copyright" and one called "trademar"." (opyright is defined within the definition while trademar"
points to another file.
<731T=T9 copyright "8opyright E///, s The World ;pins 8orp. ll rights
reser.ed. :lease do not copy or use without authoriFation. For authoriFation
contact legalGworldspins.com."$
<731T=T9 trademark ;9;T32 "httpB&&www.worldspins.com&legal&trademark.,ml"$
Maing 2lements
=lements are the basic building bloc"s of XML. 'ou define elements in a 0$0& you use them in a
document. 5 basic element definition loo"s li"e this2
<73<3231T D3;8R=:T=#1 IL:8DT, D3F=1=T=#1JM$
2lement Declaration
=ach element begins with an element declaration G>=L=M=J$. $his announces that you are
defining an element.
2lement Aame
5fter the declaration is the element%s name. $he way the name appear in the element definition is
e!actly the way it must be used in the XML document. (apitali-ation counts>
2lement )ule
5fter the name comes a rule that describes what the element can contain. $hrough this
description the element ta"e on hierarchal relationships with each other.
5lthough the basic bits of the rules are simple they can be grouped and combined to create ?uite
comple! definitions.
$his table summari-es the element rule definitions.
Contents
=lements can contain te!t other elements a combination of te!t and other elements or they may
be empty.
#e3t& =lements can contain te!tual data.
/ther 2lements& =lements can contain only other specified elements and no te!t. $he contained
element are called children of the containing element. $he containing element is the parent of the
child elements.
26
Combination& =lement can contain a mi! of te!tual data and other specific elements.
2mpty& =mpty elements get their value from their attributes. 5n empty element will typically have
at least one attribute. In #$ML the IM7 tag is a good e!ample of an empty element. It gets its
value from the src attribute.
Aumber of /ccurences
'ou can specify the number of times a child element is used within its parent.
/nce and only once& $he element listed by itself indicates that it can be used once and only
once2
0$0 definition Esed in document
<73<3231T 3N31T<=;T
I3N31TJ$
<3N31T<=;T$
<3N31T$6alsa Wood Flyer Days<&3N31T$
<&3N31T<=;T$
At least once= or many times& $he element followed by a plus sign indicates that this element
can be used many times with the parent2
0$0 definition Esed in document
<73<3231T 3N31T<=;T
I3N31TOJ$
<3N31T<=;T$
<3N31T$6alsa Wood Flyer Days<&3N31T$
<3N31T$;undays in the :ark<&3N31T$
<3N31T$Teach 9our 8hild to Fly<&3N31T$
<&3N31T<=;T$
/nce or not at all& $he element followed by a ?uestion mar" indicates that this element can be
used either one time or not at all2
0$0 definition Esed in document
<73<3231T 3N31T I<#8T=#1,
;:#1;#R-J$
<3N31T$
<<#8T=#1$West 6ay 6allpark<&<#8T=#1$
<&3N31T$
or
<3N31T$
<<#8T=#1$West 6ay 6allpark<&<#8T=#1$
<;:#1;#R$Flying Toys<&;:#1;#R$
<&3N31T$
/nce= not at all= or a many times as you (ant& $he element followed by an asteris" indicates
that this element can be used as many time as needed.
0$0 definition Esed in document
<73<3231T 3N31T I<#8T=#1M,
3N31T0123J$
<3N31T$
<<#8T=#1$West 6ay 6allpark<&<#8T=#1$
<<#8T=#1$1orth ;ide :ark<&<#8T=#1$
<3N31T0123$;undays in the :ark<&3N31T0
123$
27
<&3N31T$
or
<3N31T$
<3N31T0123$;undays in the :ark<&3N31T0
123$
<&3N31T$
/rder
'ou can specify the order in which child elements appear.
Specific order& (hild elements can be defined to be used in a specific order. $he comma ()
separates elements that are listed in a specific order. ,or e!ample you could set a rule that
creates an =D=J$LI)$. In the list you must always use the =D=J$ element followed by the
)31J)1C element.
0$0 definition Esed in document
<3N31T<=;T I3N31T,
;:#1;#RJ$
<3N31T<=;T$
<3N31T$6alsa Wood :lane Days<&3N31T$
<;:#1;#R$Flying Toys<&;:#1;#R$
<&3N31T<=;T$
2ither /r& 'ou can define child elements so that one or another can be used. $he bar (Q)
separates either or choices.
0$0 definition Esed in document
<3N31T I3N31T0123 K
;:#1;#RJ$
<3N31T$
<3N31T0123$6alsa Wood :lane Days<&3N31T0
123$
<&3N31T$
or
<3N31T$
<;:#1;#R$Flying Toys<&;:#1;#R$
<&3N31T$
'roups
7roups can be used to create comple! rules that combine elements and different usage option.
,or e!ample when groups are combined with a "use many times" symbol you can create a rule
that allows multiple uses of elements ** either in in any order or as repeated sets. ,or e!ample
here the element =D=J$LI)$ can contain multiple sets of =D=J$ and )31J)1C groups2
0$0 definition Esed in document
<3N31T<=;T I3N31T,
;:#1;#RJM$
<3N31T<=;T$
<3N31T$6alsa Wood :lane Days<&3N31T0123$
28
<;:#1;#R$Flying Toys<&;:#1;#R$
<3N31T$;undays in the :ark<&3N31T0123$
<;:#1;#R$Deer =sland Recreation
Department<&;:#1;#R$
<&3N31T<=;T
#ere the =D=J$LI)$ can contain either the =D=J$ element or the )31J)1C element but this
either or group can be used many times.
0$0 definition Esed in document
<3N31T<=;T I3N31T K
;:#1;#RJM$
<3N31T<=;T$
<3N31T$6alsa Wood :lane Days<&3N31T0123$
<;:#1;#R$Flying Toys<&;:#1;#R$
<;:#1;#R$Deer =sland Recreation
Department<&;:#1;#R$
<&3N31T<=;T
"ints for 2lement Aames
)elect names that are both easy to remember and easy to type.
7ive your tags should have some inherent meaning. ,or e!ample if you want to identify "last
name" as an element consider naming the element something li"e "last*name" or "surname."
Ese names that are consistent with current processes. If people call "social security number"
))J create an element called ))J. 0on%t create an unfamiliar "socsecnum" element.
.e consistent in your use of names. It is easier to apply one set of general rules to @< different
tags than it is to remember eight discrete tags that follow no particular pattern.
Attribute Lists
=lements can have attributes which describe the element in more detail. When you create an
element in your 0$0 you can also an create an attribute list for the element.
5ttribute lists define the name data type and default value (if any) of each attribute associated
with an element.
In this very simple e!ample we%re adding some attributes to the title element from our boo" list.
We want to be able to specify the edition date and whether the boo" is paperbac" or hardcover.
(lic" on any of the attribute list code to see what it does.
<700This defines a listing of books00$
<7D#8T9:3 books A
<73<3231T booklist Ititle, authorJ$
<73<3231T title IL:8DTJ$
<<ATT270T #i#le
edi#ion :()ATA= IR,LU7R,)
#./e :/"/erK-lo#K"rd= "/"/er">
<73<3231T author IL:8DTJ$
H
$
"ere's ho( you'd use these attributes in an XML file& Jotice the use of the edition attributes in
each title tag. Jotice how one title tag also uses the type attribute to indicate that this boo" is a
hardcover title.
Attribute #ypes
29
5ttributes can have one of se%en different types of data but the two most common are2
CDA#A. (haracter data. $his allows the attribute value to be te!tual data. 'ou use it li"e this2
<7TT<=;T edition date I8DTJ$
*re$defined %alues. 'ou can list a string of specific values that the attribute can have. $he value
set is enclosed in parenthesis and each value is separated with a vertical bar li"e this2
<7TT<=;T edition type IpaperKhardKclothJ$
Default !alues
'ou can specify a default value for the attribute or ma"e the attribute re?uired or optional. $he
default %alue has a strong effect on how the attribute is used and what values it might have if
you don%t use it in the XML tag.
5)2:@+)2D6 the attribute must have a value every time the element is listed. 'ou specify that
an attribute is re?uired li"e this2
<7TT<=;T edition date I8DTJ LR3P!=R3D$
5+M*L+2D6 the procesor ignores this attribute unless it used as part of the element. It does not
assume any default value.
5F+X2Dvalue2 an attribute is not re?uired for the element but if it occurs it must have the
specified value. ,or e!ample if the new attribute is used it must have the value of "yes"2
<7TT<=;T edition new LF=Q3D "yes"$
!AL@2 defaultvalue provides a default value for that attribute. If the attribute in not included in
the element the processing program assumes that this is the attribute%s value. ,or e!ample this
gives the type attribute a default value of "hard"2
<7TT<=;T edition type IpaperKclothKhardJ "hard"$
2ntities
5n entity is a short cut to a set of information.
When you use an entity it "e!pands" to its full meaning but you need only type the shorter entity
name during data entry. 'ou might thin" of an entity as being a bit li"e a macro ** it is a set of
information that can be used by calling one name.
XML defines two types of entities.
$he general entity is one that you define in a D#D and use in a document. 7eneral entities
are easy to spot. $hey are defined with the entity declaration G>=J$I$' and when they are used
they begin with the ampersand and end with the semicolon li"e this2
Centity0nameD
$he parameter entity is one that you define and use (ithin a D#D. $he content of a
parameter entities may be either included in the 0$0 or stored in an e!ternal file. In addition
parameter entities must be parsed& they cannot be unparsed. $hat is they must contain te!tual
data that is processed rather than a 7I, or other non*te!tual data type.
It too is defined with a entity declaration but it is called with a percent sign li"e this2
RinfoD
Defining a 'eneral 2ntity
$o define an entity2
9. )tart the entity definition with a less than sign an e!clamation mar" and the phrase =J$I$'
all in caps2
<731T=T9
@. $ype the name of the entity. $ype it using the capitali-ation that you will use when calling it
later on.
<731T=T9 copyright
30
8. If you are defining the entity locally type the value of the entity surrounded by ?uotes and
then close the entity definition with a greater than sign.
<731T=T9 copyright "8opyright E///, s The World ;pins 8orp. ll
rights reser.ed. :lease do not copy or use without authoriFation. For
authoriFation contact legalGworldspins.com."$
B. If you are defining an entity in an e!ternal ascii te!t file put in a pointer to the e!ternal file
then close the entity definition with a greater than sign.
<731T=T9 copyright ;9;T32
"httpB&&www.worldspins.com&legal&copyright.,ml"$
@sing a 'eneral 2ntity
'ou won%t be using a general entity in a 0$0. 'ou will only be defining it here. 'ou will be using it
in an XML file where it is called by tying an ampersand the entity name and a semi*colon
Oentity*name&
Defining a *arameter 2ntity
$o declare a parameter entity2
9. $ype the entity declaration2
<731T=T9
@. $ype a space followed by a percent sign. It is important to remember the space>
<31T=T9 R
8. $ype another space followed by the name of the entity2
<731T=T9 R list
B. $ype the value of the entity surrounded by ?uotation mar"s2
<731T=T9 R list "name 8DT LR3P!=R3D gender Im K fJ "f" color Ired K
fawn K merle K blackJ"
F. =nd the declaration with an end tag symbol.
<731T=T9 R info "name 8DT LR3P!=R3D gender Im K fJ LR3P!=R3D color
Ired K fawn K merle K black KotherJ LR3P!=R3D"
1ne thing to notice about entities in a 0$0 is that when they are defined there is a space
between the percent sign and the entity name**but when the entity is used there is no space
between the percent sign and the entity name.
@sing a *arameter 2ntity
It is ?uite simple to use a parameter entity. )imply enter the entity name preceded by a percent
sign and followed by a semi*colon li"e this2
<"#!1D I123J$
<7TT<=;T "#!1D RinfoD$
31
<W#R>=1? I123J$
<7TT<=;T W#R>=1? RinfoD$
<8#2:1=#1 I123J$
<7TT<=;T 8#2:1=#1 RinfoD$
When the 0$0 is processed the entity will be e!panded. In this e!ample Rinfo& will be replaced
with a set of attribute data which was defined in the info entity declaration.
5gain remember that when a parameter entity is defined there is a space between the percent
sign and the entity name**but when the entity is used there is no space between the percent sign
and the entity name.
XML *arsers
3arsing is the process of chec"ing the synta! of your document and creating the "tree structure."
If you are using a validating parser the process will also compare the XML file to its 0$0.
/n$line *arsers
$here are a number of online parsers. $o use these you typically type in the ECI of your file and
tell the process to begin.
1nline validating parser from the W8(
$he W8( offers an online parser. $ype the ECL of the file into the form and the XML file is
both parsed and validated.
Dalidating 3arser from .rown Eniversity )cholarly $echnolgy 7roup
$his is the most easily accessible and understandable presentation of the online parsers.
Do(nloadable *arsers
$here are many parsers that you can download and run on your local machine. Most of these
re?uire you to have either a Windows or EJIX machine. $hey are written in a variety of
langauges& this is a cross section of some of the many which are available.
+ames (lar"%s e!pat parser
+ames (lar" is amost a brand in the )7MLIXML world. #is rendition of an XML parser is
widely used.
+ava*based Dalidating XML 3arser
,rom I.M%s 5lphaWor"s group this parser claims to be 9<<R pure +ava.
Microsoft XML 3arser in (SS
5 parser from Microsoft.
XML 3arser written in 3ython
$his is a validating parser.
XML 3arser written in +ava)cript.
$his parser is non*validating and chec"s XML synta! only.
)iC35( )imple C0, 3arser and (ompiler
,rom the W8(.
XML Synta3
$agging an XML document is in many ways similar to tagging an #$ML document. #ere are
some of the most important guidelines to follow.
)ule 5,6 )emember the XML declaration
32
$his declaration goes at the beginning of the file and alerts the browser or other processing tools
that this document contains XML tags. $he declaration loo"s li"e this2
<-,ml .ersion+"'./" standalone+"yes&no" encoding+"!TF0("-$
'ou can leave out the encoding attribute and the processor will use the E$,*; default.
)ule 5-6 Do (hat the D#D instructs
If you are creating a valid XML file one that is chec"ed against a 0$0 ma"e sure you 6now
what tags are part of the 0$0 and use them appropriately in your document. Enderstand what
each does and when to use it. 6now what the allowable values are for each. ,ollow those rules.
$he XML document will validate against the specified 0$0.
)ule 576 Watch your capitali8ation
XML is case*sensitive. G3H is not the same as GpH. .e consistent in how you define element
names. ,or e!ample use 5LL (53) or use Initial caps or use all lowercase. It is very easy to
create mis*matching case errors.
5lso ma"e sure starting and ending tags use matching capitali-ation too. If you start a
paragraph with the G3H tag you must end it with the GI3H tag not a GIpH.
)ule 596 :uote attribute %alues
In #$ML there is some confusion over when to enclose attribute values in ?uotes. In XML the
rule is simple2 enclose all attribute values in ?uotes li"e this2
<123 dob+"'45/"$6en %ohnson<&123$
)ule 5;6 Close all tags
In XML you must close all tags. $his means that paragraphs must have corresponding end
paragraph tags. 5nchor names must have corresponding anchor end tags. 5 strict interpretation
of #$ML says we should have been doing this all along but in reality most of us haven%t.
)ule 5<6 Close 2mpty tags= too
In #$ML empty tags such as <br> or <img> do not close. In XML empty tags do close. 'ou
can close them either by adding a separate close tag (GItagnameH) or by combining the open
and close tags into one tag. 'ou create the openIclose tag by adding a slash I to the end of the
tag li"e this2
<br&$
23amples
$his table shows some #$ML common tags and how they would be treated in XML.
Tag Comment End-Tag
G3H $echnically in #$ML you%re supposed to close this
tag. In XML it%s essential to close it.
</P>
G=L=M=J$H 5ll =lements in XML must have a )tart*tag and an
end*tag.
GI=L=M=J$H
GLIH $his tag must be closed in XML in order to ensure a
Well*,ormed XML document.
GILIH
33
GM=$5
nameK""eywords"
contentK"XML )7ML
#$ML"H
M=$5 tags are considered empty elements in XML
and they must close.
GM=$5 nameK""eywords"
contentK"XML )7ML
#$ML"IH
G.CH .rea" tags are considered empty elements. G.CIH
GIM7 srcK
"coolpictures.html"H
$his is an empty element tag. GIM7 srcK
"coolpictures.html"IH
2lement and Attribute )ules
$he first table contains the basic guidelines for creating element rules in an XML 0$0.
$he second contains attribute value types.
$he third contains attribute default options.
2lement )ules6
Symbol Meaning 23ample
P3(05$5 (ontains parsed
character data or
te!t.
<:#WIL:8DTJ$
$he 31W element contains te!tual data.
P3(05$5
element*
name
(ontains te!t and
another element.
P3(05$5 is always
listed first in a rule.
<:#WIL:8DTT, 123J$
$he 31W element must contain both te!t and the J5M=
element.

(comma)
Ese in this order
<:#W I123, R1>, ;3R=<J$
$he 31W element must contain the J5M= element
followed by the C5J6 element followed by the )=CI5L
element.
Q
(bar)
Ese either or
< :#WI123 K R1> K ;3R=<J$
$he 31W element must contain either the J5M=
element or the C5J6 element or the )=CI5L element.
name
(by itself)
Ese one time only
<:#W I123J$
$he 31W element must contain the J5M= element
used e!actly one time.
name4 Ese either once or
not at all
<:#WI123, R1>-, ;3R=<-J$
$he 31W element must contain the J5M= element used
e!actly oncee followed by one or none C5J6 elements
and one or none )=CI5L elements.
34
nameS Ese either once or
many times
<:#WI123O, R1>-, ;3R=<J$
$he 31W element must contain at least one but maybe
more J5M= elements followed by one or none C5J6
elements and e!actly one )=CI5L elements.
nameT Ese once use many
times or don%t use it
at all.
<:#WI123M, R1>-, ;3R=<J$
$he 31W element must contain at one many or none
J5M= elements followed by one or none C5J6
elements and e!actly one )=CI5L elements.
( ) Indicated groups
may be nested.
<:#WIL:8DT K 123JM$
$he 31W element contains one more use uses of either
or both te!t and the J5M= element.
<:#WII123M, R1>-, ;3R=<JM K 8#2231TJ$
$he 31W element must contain many instances of the
group that contains one many or none J5M= elements
followed by one or none C5J6 elements and e!actly
one )=CI5L elements. 1C it may contain one
(1MM=J$ element.
<:#WI123 K R1>JO$
$he 31W element must contain a J5M= or C5J6
element. $he J5M= or C5J6 option may appear once or
may be repeated many times.
Attribute !alues6
#ype Meaning 23ample
(05$5 (haracter data te!t.
<TT<=;T 8#2231T category
()ATA LR3P!=R3D$
$he (1MM=J$ element has an
attribute named category. $his
attribute contains letters numbers
or punctuation symbols.
JM$16=J Jame to"en te!t with some restrictions.
$he value contains number and letter.
#owever it cannot begin with the letters
"!ml" and the only symbols it can contain
are U * . and 2..
<TT<=;T 8#2231T category
*+TO5,* LR3P!=R3D$
$he (1MM=J$ element has an
attribute named category. $his
attribute contains a name to"en.
(value*9 Q
value*@ Q
5 value list provides a set of acceptable
options for the attribute to contain. In
<TT<=;T 8#2231T category
:red K $reen K bl6e K
35
value*8)
value list
general you should always include "other"
as one of the options.
o#er= "other"$
$he (1MM=J$ element has an
attribute named category. $he
category can be "red" "green"
"blue" or "other." $he default value
is "other."
I0 $he "eyword I0 means that this attribute
has an I0 value that idenifies this particular
element.
<TT<=;T 8#2231T category
7) L=2:<=3D$
$he (1MM=J$ element has an
attribute named category. $he
category will contain an I0 value. I0
and I0C=, wor" together to create
cross*references.
I0C=, $he "eyword I0C=, means that this
attribute has an I0 reference value that
points to another instance%s I0 value.
<TT<=;T 8#2231T category
7)R,1 L=2:<=3D$
$he (1MM=J$ element has an
attribute named category. $he
category will contain an I0C=,
value. I0 and I0C=, wor" together
to let you do cross*reference
elements.
=J$I$' $he "eyword =J$I$' means that this
attribute%s value is an entity. 5n entity is a
value that has been defined elsewhere in
the 0$0 to have a particular meaning.
<TT<=;T 8#2231T category
,*T7T? L=2:<=3D$
$he (1MM=J$ element has an
attribute named category. $he
category will contain an entity name
rather than te!t.
J1$5$I1J $he "eyword J1$5$I1J means that this
attribute%s value is a notation. 5 notation is
a description of how information should be
processed. 'ou could set up a notation
that allows only numbers to be used for the
value for e!ample.
<TT<=;T 8#2231T category
*OTAT7O* L=2:<=3D$
$he (1MM=J$ element has an
attribute named category. $he
category attribute will contain a
notation name.
Attribute Default /ptions6
#ype Meaning 23ample
PC=MEIC=0 $he attribute must always be
included when the element is
used.
<TT<=;T 8#2231T category 8DT
IR,LU7R,)$
$he (1MM=J$ element has an attribute named
category. $his attribute contains letters numbers
or punctuation symbols. $he attribute must always
be used with the element. If you omit the attribute
the parser will give you an error message.
36
PIM3LI=0 $he attribute is optional. If
you see the "eyword
L=2:<=3D you "now that
this attribute will be ignored
unless it is included in the
element tag. It won%t ta"e on
any default values.
<TT<=;T 8#2231T category 8DT
I7+827,)$
$he (1MM=J$ element has an attribute named
category. 'ou may use the attribute or omit the
attribute as the instance re?uires.
P,IX=0 $he attribute is optional but
if it is used it must always
have a certain value. If you
see the "eyword P,IX=0
you "now that this attribute
will always have the
specified value when it is
entered.
<TT<=;T 8#2231T confirm I17A,)
".es"$
$he (1MM=J$ element has an attribute named
confirm. If it is used its value will be "yes." If it is
not used it will not have a value.
"value" 5 value in ?uotes is the
default value of this attribute.
If you don%t enter the
attribute in the element tag
the processor will assume
the attribute has this default
value.
<TT<=;T 8#2231T category IredKgreenK
blueKotherJ "o#er"$
$he (1MM=J$ element has an attribute named
category. If you don%t use the attribute in the
element tag the attribute will automatically receive
the value "other."
+nteraction .et(een Components
XML ()) script the 01M and the browser wor" together to let you create interactive presentations of
your content. (lic" on each piece to learn what role it plays.
37
(opyright L 9::;*::
0evX.com Inc.
XML *arsers
3arsing is the process of chec"ing the synta! of your document and creating the "tree structure."
If you are using a validating parser the process will also compare the XML file to its 0$0.
/n$line *arsers
$here are a number of online parsers. $o use these you typically type in the ECI of your file and
tell the process to begin.
1nline validating parser from the W8(
$he W8( offers an online parser. $ype the ECL of the file into the form and the XML file is
both parsed and validated.
Dalidating 3arser from .rown Eniversity )cholarly $echnolgy 7roup
$his is the most easily accessible and understandable presentation of the online parsers.
Do(nloadable *arsers
$here are many parsers that you can download and run on your local machine. Most of these
re?uire you to have either a Windows or EJIX machine. $hey are written in a variety of
langauges& this is a cross section of some of the many which are available.
+ames (lar"%s e!pat parser
+ames (lar" is amost a brand in the )7MLIXML world. #is rendition of an XML parser is
widely used.
+ava*based Dalidating XML 3arser
,rom I.M%s 5lphaWor"s group this parser claims to be 9<<R pure +ava.
Microsoft XML 3arser in (SS
5 parser from Microsoft.
38
XML 3arser written in 3ython
$his is a validating parser.
XML 3arser written in +ava)cript.
$his parser is non*validating and chec"s XML synta! only.
)iC35( )imple C0, 3arser and (ompiler
,rom the W8(.
+ntroduction to .eha%iors
.ehaviors are an enhancement to Internet =!plorer F that allow designers to add scripting elements
without having to do the scripting needed to ma"e them wor". .ehaviors are also a way in which scripters
can write a script once and turn it over to designers for use whenever needed.
)o what can behaviors do4 .y using XML we can lin" behaviors to any element in a Web page and
manipulate that element. We can for e!ample copy that element%s te!t into a pull?uote area on the page.
We could offer a way to magnify small type on a page. Many of the everyday things we do with scripting
can be transfered to behaviors and by combining them with XML we can have greatly enhanced Web
pages that will wor" down the browser foodchain with no ill effects.
5t the left you will find lin"s to several behaviors created here at 3ro/ect (ool. =ach lin" will ta"e you to a
page that not only demonstrates the behavior but also shows you /ust how simple they are to implement.
We%ve divided our behaviors into two categories2
f3 * )pecial =ffects behaviors don%t add value neccessarily but do add eye*catching special effects that can
ma"e your page stand out if used appropriatly.
publishing * $hese behaviors can add value and utility to pages of te!t content. $hey ma"e your pages much
more usable for the viewer or add new ways to get them involved in the te!t.
)o what are you waiting for4 (lic" one of the lin"s to the left and start e!ploring what you can do with
behaviors and XML.
(opyright L 9::;
2arthBuae>
$his behavior falls into the realm of special effects. It%s really not useful but it could help provo"e mood on a
website. $o see it in action /ust run your mouse over the headline.
While it would probably be easy to implement this in the document directly we%ve chosen to use it as a
behavior. 3art of the beauty of behaviors is that they allow a designer to ta"e pre*written code and effects
and insert them into a webpage without having to be a programmer. .y having effects li"e =arth?ua"e
available as behaviors a designer can build of an astonishing repetoire of web display tools without
needing to learn +ava)cript.
=arth?ua"e is set up via XML so you%ll need to create an appropriate namespace before you can use it. We%re doing it
as XML so that older browsers aren%t affected adversely. It also let%s us define a brand new tag. $he namespace is set
up in the GhtmlH tag on your webpage. #ere%s the one we%re using on this page2
Ghtml !mlns2f!H
$he ne!t step is to define XML tag we%ll be using. $his is done in the specific media type. In this case the
behavior will apply to the screen so will place its ()) properties there and we associate it with our
namespace by prefi!ing the namespace to the declaration. 1ur declaration loo"s li"e this2
GstyleH
G>**
Vmedia screenW
f!X2=5C$#ME56= W behavior2url(earth?ua"e.htc) Y
Y
**H
GIstyleH
5s you can see the only part that is needed is the behavior property. It must point to the behavior file
earth?ua"e.htc. 'ou can download the earth?ua"e.htc file here. 1nce you have it /ust ma"e sure it%s on
your server and that the url is specified properly in your ()).
5ll that%s left then is to place the XML tags around the item you wish to trigger the earth?ua"e behavior.
=arth?ua"e will be triggered when someone runs their mouse over the item. $he tagging is very simple and
loo"s li"e this2
Gf!2=5C$#ME56=H)ha"e it baby>GIf!2=5C$#ME56=H
Jow you%ve got it everything you need to "now to create your own earth?ua"es. )o...uh....)ha"e it baby>
39
#ype(riter .eha%ior
)ure it owes its heritage to movies and computer gaming but a typewriter effect can be ?uite eye*catching if used
properly. We%d bet you%re reading this as it types. It%s not for every Web site though so use it sparingly.
$his behavior can be set to type at whatever speed you need. $he above e!ample types at a speed of one
character every 9<< milliseconds.
$ypewriter is set up via XML so you%ll need to create an appropriate namespace before you can use it.
We%re doing it as XML so that older browsers aren%t affected adversely. It also let%s us define a brand new
tag. $he namespace is set up in the GhtmlH tag on your Web page. #ere%s the one we%re using on this
page2
Ghtml !mlns2f!H
$he ne!t step is to define XML tag we%ll be using. $his is done in the specific media type. In this case the
behavior will apply to the screen so will place its ()) properties there and we associate it with our
namespace by prefi!ing the namespace to the declaration. 1ur declaration loo"s li"e this2
GstyleH
G>**
Vmedia screenW
f!X2$'3=WCI$=C W behavior2url(typewriter.htc)&
height2 Bem&
font*family2 "ocr a e!tended" courier&
Y
Y
**H
GIstyleH
$he most important part of that is the behavior property. It%s the only part really needed and it must point to
the behavior file typewriter.htc. 'ou can download the typewriter.htc file here. 1nce you have it /ust ma"e
sure it%s on your server and that the url is specified properly in your ()).
5ll that%s left then is to place the XML tags around the te!t you wish to have typed onto the page. $hat%s
simple too2
Gf!2$'3=WCI$=C speedK"9@<"H$ype this te!t.GIf!2$'3=WCI$=CH
Jotice that we%ve set the speed to 9@<. If you don%t set a speed the typing will appear with the default
setting of 9<<.
$hat%s really about all you need to "now to use it. .e aware that this behavior only runs once and only when the page
is first loaded. )o if you use this ma"e sure it%s someplace that your users will be able to see it.
Jow start typing>
Footnote .eha%ior
If you%ve ever seen a Web document with footnotes you "now what a problem it is to read a relevant
footnote and then scroll bac" up the document to find where you had stopped reading. $his behavior
changes that. It will bring the footnotes to the user(9) without the need for them to scroll away from their
place in the page.
Let%s face it too footnotes can be ugly things tac"ed to the bottom of a page. .y implementing a footnote
tag via a behavior and XML we can give a designer complete control over what the footnote is going to loo"
li"e when it appears for the user. =verything about the way the footnote loo"s can be ad/usted via ()).
)ince ,11$J1$= is set up via XML so you%ll need to create an appropriate namespace before you can
use it. We%re doing it as XML so that older browsers aren%t affected adversely. It also let%s us define a brand
new tag. $he namespace is set up in the GhtmlH tag on your webpage. #ere%s the one we%re using on this
page2
Ghtml !mlns2pubH
$he ne!t step is to define XML tag we%ll be using. $his is done in the specific media type. In this case the
behavior will apply to the screen so will place its ()) properties there and we associate it with our
namespace by prefi!ing the namespace to the declaration. 1ur declaration loo"s li"e this2
GstyleH
G>**
Vmedia screen W
pubX2,11$J1$= Wbehavior2url(footnote.htc)Y
.footstyle Wwidth2 @F<&
position2 absolute&
left2*9<<<&
color2 blac"&
40
bac"ground*color2 P::::cc&
te!t*align2 /ustify&
border*color2 PB<B<B<&
border*width2 thin&
border*style2 solid&
padding2 9em&
font*family2 arial&
font*si-e2 9<pt&
Y
.closer Wcursor2 hand&
colorK"Pffff<<"&
te!t*align2 right&
margin*top2 9em&
Y
.fhilite Wcursor2 hand&
color2 chocolate&
font*family2 "5rial"&
te!t*decoration2 none&
Y
Y
**H
GIstyleH
5s you can see ,11$J1$= only needs the behavior property. It must point to the behavior file
footnote.htc. 'ou can download the footnote.htc file here. 1nce you have it /ust ma"e sure it%s on your
server and that the url is specified properly in your ()).
$here are three ()) classes defined in the namespace as well. $hese are all used by the footnote
behavior. $he first is footstyle. $his defines how the footnote will loo" when the user calls it. It should and
applied to the division holding the footnote and it%s important that it have at least three properties2
width sets the display width in pi!els of all footnotes.
left property is used to hide the footnote until it is called.
postition2 absolute frees the footnote so that it can be postitioned anywhere on the page.
$he closer class describes how the word "close" will loo" in the displayed footnote bloc". $his word is
added to the bottom right corner of footnotes so that there is an option to remove them from the page
display.
Lastly the class fhilite describes how the footnote lin" will appear and adds a hand cursor for user
feedbac".
'ou%ll need to create individual divisions for each footnote to be displayed. #ere%s what one from this page
loo"s li"e2
Gdiv idKfoot9 classKfootstyleH
Ga nameK"footnote9"HGIaH
(9) 5 user used to be someone who was heavily into drugs.
#ere a user simply refers to the person using a Web page.
In this case you.
GIdivH
$he id of the division is e!tremely important. It is via this id that the behavior manipulates the footnote. $he
name can be anything you li"e as long as it is uni?ue. 'ou%ll be using it in the ,11$J1$= tag to lin" the
action to the division. In this case we used the id of foot9. $his would be referenced in the ,11$J1$= tag
as footJameK"foot9"(@).
Let%s ta"e a loo" now at how that last footnote was called2
Gpub2,11$J1$= footJameK"foot@"H
Ga hrefK"Pfootnote@"H(@)GIaHGIpub2,11$J1$=H
It%s that simple. Jotice we%ve placed it around wor"ing #$ML which would scroll down to the footnote in
older browsers. $he footnote behavior will erase that for I=F and replace it with appropriate #$ML to call
our enhanced footnotes leaving /ust the te!t that is present within the tag.
'ou should note that footJame is a re?uired property. If you forget to include it you won%t get an error
message. $he enhanced footnote behavior will simply do nothing.
1" so consider yourself armed er footed. 'ou should now "now everything you need to apply footnotes
to your pages
Magnify .eha%ior
41
It%s become commonplace today to see websites that have lots of te!t crammed into a small area.
1ftentimes some of that te!t is in the tiniest possible font. I can%t spea" for everyone but in the wee hours
of the morning it can be hard to read that te!t. 1ften I%ve wished for a way to magnify it without resi-ing the
fonts in my browser.
It seems a natural that having an easy way to magnify /ust a portion a page would be ideal. .y creating a
behavior for this and lin"ing it to the page via XML it ma"es it possible for a magnify effect to be used
nearly anywhere yet have the pages still wor" seamlessly for older browsers.
)ee how easy it is to read magnified te!t by clic"ing the icon4 5fter you%ve opened this you can close it by
clic"ing the close icon on the bottom right.)ee how easy it is to read magnified te!t by clic"ing the icon4 5fter
you%ve opened this you can close it by clic"ing the close icon on the bottom right.
If you loo" to your right you%ll see an area of small te!t. If you are using I=F beta @S you%ll also see an icon
of a magnifying glass. 1lder browsers won%t show this icon since it was inserted into the page via the
magnify behavior. If you clic" the icon a te!t window will display a magnified version of the e!act te!t that is
contained in the bloc" along with an icon that will allow you to close it. 5lso if there is any #$ML formatting
in that te!t such as a lin" it will be applied to the magnified version as well.
$his behavior was designed so that nearly all the control is in the hands of the designer. $he only
e!ception being the names of the icons used to indicate magnify and close magnify. $hese must be set in
the .#$( file controlling this behavior. =verything else is done in the Web page itself using ()) and XML.
)ince M57JI,' is set up via XML so you%ll need to create an appropriate namespace before you can use
it. We%re doing it as XML so that older browsers aren%t affected adversely. It also let%s us define a brand
new tag. $he namespace is set up in the GhtmlH tag on your webpage. #ere%s the one we%re using on this
page2
Ghtml !mlns2pubH
$he ne!t step is to define XML tag we%ll be using. $his is done in the specific media type. In this case the behavior will
apply to the screen so will place its ()) properties there and we associate it with our namespace by prefi!ing the
namespace to the declaration. 1ur declaration loo"s li"e this2
$he ne!t step is to define the XML and the tag properties for M57JI,'. In doing this we also create a
class called "magstyle" that defines what the magnified te!t will loo" li"e. $his is done in the specific media
type. In this case the behavior will apply to the screen so will place its ()) properties there and we
associate it with our namespace by prefi!ing the namespace to the declaration. 1ur declaration loo"s li"e
this2
GstyleH
G>**
Vmedia screen W
pubX2M57JI,' Wbehavior2url(magnify.htc)Y
.magstyle Wcolor2 blac"&
bac"ground*color2 goldenrod&
border*color2 Pblac"&
border*width2 thin&
border*style2 solid&
padding2 9em&
font*family2 arial&
font*si-e2 9Apt&
position2 absolute&
left2*9<<<&
Y
Y
**H
GIstyleH
$he most important part of that is the behavior property. It must point to the behavior file magnify.htc. 'ou
can download the magnify.htc file here. 1nce you have it /ust ma"e sure it%s on your server and that the url
is specified properly in your ()).
1ne thing to notice about the magstyle class is that it specifies a left position of *9<<<. $his is so that the
#$ML that the behavior creates will be hidden from the user by appearing far off to the left of the display
window. We%re doing this in part because of a small display glitch in the version of I= used to create this
and also because it%s always been my prefered way to hide content. It%s /ust as easy to specify a new
postion as it is to specify hiddenIvisible.
'ou%ll also need to download the two icons used by this behavior. Cight clic" on each one and then
select "save picture" to save magnify.gif and unmag.gif. $his behavior loo"s for these icons in a
directory called images. 'ou can change these icons to others by editing the magnify.htc file to point to
other images. 'ou need one icon to represent the magnify option and one to indicate close magnify.
42
5ll that%s left then is to place the XML tags around the te!t for which you wish to offer a magnified view. It%s this simple2
Gpub2M57JI,' newIdK"9" widthK"B<<" alignK"left"H$he te!t that
you wish to be magnifiable.GIpub2M57JI,'H
I%m sure you noticed the properties we are passing to the magnify behavior. $he first one newId is
re?uired. While we could have added a comple! random identification generation routine to the behavior
we chose to "eep it simple and simply as" the designer to assign a uni?ue name to each magnifiable
section. 5lways be sure to assign a value to newId. $his is needed to lin" the icon to the newly generated
#$ML of the magnified te!t.
$he other two properties are optional they don%t need to be specified. width specifies how wide the
magnified area should be on the display. It defaults to 8F< pi!els if no width is specified. .y ma"ing this a
specifiable property the designer is given control of how the te!t will fit the screen with each magnified
area.
$he other property align specifies the alignment of the magnify icon. 1nly "left" and "right" are correct
values here. 5ny other value or no specification at all will cause left alignment to be used.
.y now you should be ready to apply magnification to your own Web pages. If you still feel a bit
uncomfortable trying this then view the source of this page to see how we%ve done it.
Jow go forth and magnify>
*ullBuote .eha%ior
If you%ve ever pic"ed up a maga-ine then the odds are good that you%ve seen a pull?uote. 5 pull?uote is
where a bit of te!t from the body of an article or story is pulled from the te!t and highlighted in some way to
catch your eye. It%s hoped that the ?uote will tease you enough to get you to read the story.
Ep until now "...it%s been a pain to do pull?uotes in a Web page."it%s been a pain to do pull?uotes in a Web
page. It always re?uired wor"ing them into the #$ML code and hand copying the te!t to be ?uoted. ,or
those reasons pull?uotes have been a bit scarce on the Web.
.y using a pull?uote behavior it%s now possible for anyone to put a pull?uote into a Web page without
having to do comple! layout tric"s. It%s as simple as putting a tag and some basic ()) into a Web page.
'ou setup 3ELLME1$= via XML so you%ll need to create an appropriate namespace before you can use it. We%re
doing it as XML so that older browsers aren%t affected adversely. It also let%s us define a brand new tag. $he
namespace is set up in the GhtmlH tag on your Web page. #ere%s the one we%re using on this page2
Ghtml !mlns2pubH
$he ne!t step is to define XML tag we%ll be using. $his is done in the specific media type. In this case the
behavior will apply to the screen so will place its ()) properties there and we associate it with our
namespace by prefi!ing the namespace to the declaration. 1ur declaration loo"s li"e this2
GstyleH
G>**
Vmedia screen W
pubX23ELLME1$= Wbehavior2url(pull?uote.htc)Y
.pullstyle Wwidth2 @<<&
color2blac"&
te!t*align2 left&
border*color2P::AAcc&
border*width2thin&
border*style2solid&
border*right2 none&
border*left2 none&
padding2 9em&
margin2 Apt&
font*family2 arial&
font*style2 italic&
font*si-e2 9Bpt&
Y
Y
**H
GIstyleH
5s you can see 3ELLME1$= only needs the behavior property. It must point to the behavior file
footnote.htc. 'ou can download the pull?uote.htc file here. 1nce you have it /ust ma"e sure it%s on your
server and that the url is specified properly in your ()).
We also define a class called pullstyle. $his is the ()) description of how a pull?uote will loo" when
rendered on a Web page. $he behavior will apply this style to the pull?uote that it creates. 'ou have
complete control over the appearance by changing the properties and values in pullstyle.
43
5ll that%s left then is to place the XML tags around the te!t for which you wish to offer a magnified view.
#ere%s how we mar"ed the pull?oute near the top of this page2
Gpub23ELLME1$= alignK"right" lipsK"pre"Hit%s always
been a pain to do pull?uotes in a Web page.GIpub23ELLME1$=H
$he align property specifies whether the pull?uote will align on the left or the right of the page. Its valid
values are surprisingly enough left or right. If you don%t specify an alignment then the pull?uote will align
on the left.
$he second property is lips. $his is our abbreviation for ellipsis. "5n ellipsis is a series of three dots..."5n
ellipsis is a series of three dots that can be used at the beginning of a ?uote at the end or on both ends.
'ou can see an ellipsis in the pull?uote to your left. $he acceptable values for lips are pre post or both. 5ll
other values will be ignored. $his is an optional property but it is very useful if you are only ?uoting part of a
sentence.
,inally /ust a few thoughts on proper use.
5 pull?uote should contain te!t that will draw the reader in. It should only contain a small amount of
relevant te!t and not several sentences. 'ou%ll probably want to consider using it near the top of a page so
that it will be seen immediately by a prospective viewer. 'ou also shouldn%t ma"e the style too different
from the rest of the page. It should fit in yet be immediately visible.
With those thoughts in mind as well and your newfound "nowledge of how to apply this behavior it%s time
for you go to out there and pull one over on someone. 5 ?uote that is.
44

Vous aimerez peut-être aussi