Vous êtes sur la page 1sur 7

XML

Course Tutor: Dr. Raheel Siddiqi

Introduction
XML is a type of language which has been developed for the web which is different to any other type of scripting or programming language available before. Instead of being concerned with the processing and display of data, XML's primary purpose is to tell the computer what data entered actually means. There are two main reasons for the development of XML:
1. Computers do not understand the information placed in them. For example there is no

way for a search engine, or any other computer, to know that an HTML page contains the introduction part of an XML tutorial. All it knows is that such a page is a collection of letters and numbers, with HTML formatting around it. The computer cannot even tell what on such a page is a heading, what is text and what is an advert. This is the main problem which XML was designed to overcome. If a page or document is written in XML, a computer can understand exactly what it is about. As will probably be obvious, this has very major implications for search engine technology. If a search engine knew exactly what was on a page, it would be able to instantly provide the exact results a person was looking for, with no inaccurate matches and no half-relevant pages. 2. Web pages are not compatible across different devices. One of the major difficulties that web designers have today is that people are now accessing the pages from a variety of different devices. PCs, Macs, mobile phones, palmtop computers and even televisions. Because of this, web designers must now either produce their pages in several different formats to cope with this. Because XML is used to define what data means and not how it is displayed, it makes it very easy to use the same data on several different platforms. The thing about XML which people find the most difficult to understand is that XML does not actually do anything. XML is not a way to design your home page. This has made many people believe that XML is useless, as they can't see a way that it will benefit them. XML has a wide variety of benefits though, two of which were outlined above. The real use of XML, though, is to describe data. HTML is used to describe how data is formatted. XML is used to describe what data actually means. XML looks, and is structured very similarly to HTML. They both use the system where tags are used to enclose the data they refer to. They both can use nested tags and both can also have attributes added to their tags. The most revolutionary thing about XML, though is that you are not restricted to just using the normal, pre-defined tags like font and br. Instead you are responsible for making up the tags yourself. You can name them anything you like and can use

them to represent anything you like. This is a feature which cannot be found in any other scripting language on the web.

XML Structure
An XML document system has usually three components:

First, there is the set of XML documents itself. The documents hold the meaning of the information, the content (semantics). The structure of the XML document can be expressed similarly to a computer language grammar, which tells you how elements can stand in relation to each other. The grammar for a set of documents with the same structure is called a "document type definition" or DTD. DTDs can be very complex and large. A real document is called an "instantiation" of its DTD. In order to present the document to a human reader, it needs to be presented in some fashion. For that, the system needs a set of rules specifying how to render each element: the "style sheet".

What is Markup?
Markup gives meaning to a document This is something we do everyday like highlight text, etc. If we want others to understand what markup means, we need a set of rules to declare what constitutes markup and declaring exactly what it means. A markup language is just a set of rules. In SGML for example anything in angle brackets is considered markup. <...>

The makers of HTML used SGML to make a set of rules declaring what markup in HTML means. This set of rules is contained in a separate document called the HTML DTD (Document Type Definition). HTML DTD for example says that when you come across a <P> in a document start a new paragraph. There are three different types of markup:
Structural Markup Tells how the document should be structured; i.e. which position within a document an element takes. e.g.: <P> Stylistic Markup Tells how the document is to be styled. These are procedural tags which give formatting instructions. e.g. <I>, <U>, <B>,... Semantic Markup Tells us something about the content of the document. <TITLE>, <CODE>,...

HTML mainly focuses on structural markup and stylistic markup while the markup of content has not been a focus. In XML you can define your own semantic markup tags like e.g.:
<song> <title>Requiem</title> <composer>Mozart</composer> </song>

In HTML you would write:


<P>Requiem is a song composed by Mozart</P>

Tags
XML is very similar to HTML. It has tags, which identify elements. These tags also contain attributes about these elements. In XML a Tag is what is written between angled brackets, i.e. XML tags open with the < symbol and end with the > symbol. They always come in matched pairs. <composer> is an example for an opening tag. In XML all opening Tags must have closing tags, in this case the closing tag would look like this: </composer>.

Element
<composer>Mozart</composer>

Start Tag

The beginning of every non-empty XML element is marked by a start-tag. An example of a start-tag:

<composer>

End Tag

The end of every non-empty XML element is marked by an end-tag. An example of an end-tag:
</composer>

Element Content

The text between the start-tag and end-tag is called the element's content. The element content in this case would be:
Mozart

Empty Element Tag

If an element is empty, it must be represented either by a start-tag immediately followed by an end-tag or by an empty-element tag. An empty-element tag takes a special form:
<BR/>...empty element tag in XML, or <BR></BR> As opposed to HTML where a line break is declared with: <BR>...start tag without an end tag

Attributes

The Name-AttValue pairs are referred to as the attribute specifications of the element, with the Name in each pair referred to as the attribute name and the content of the AttValue as the attribute value. In the following example we have an empty element 'IMG' and two attributes 'align' and 'src'.
<IMG align="center" src="logo.gif"/>

Character Data
Text consists of character data and markup. XML defines the text between the start and end tags to be "character data" and the text within the tags to be "markup".
Markup takes the form of: start tags end tags empty element tags entity references Comments Character Data All text of a document that is not markup

Well formed XML documents


Well formed XML documents simply markup pages with descriptive tags. You don't need to describe or explain what these tags mean. In other words a well formed XML document does not need a DTD, but must conform to the XML syntax rules. If all tags in a document are correctly formed and follow XML guidelines, then a document is considered as well formed. Syntax is the Grammar of a language. For a document in XML to be well formed, it must obey the following most important rules:

XML documents must contain at least one element. In this example "Tootsie" is not well formed, because it is not marked up as an element within angle brackets.

Well Formed <title>Tootsie</title>

Not Well Formed "Tootsie"

XML documents must contain a unique opening and closing tag that contains the whole document, forming what is called a root element. In this example, the second column is not well formed because it lacks a root element as in the first column:
<videocollection>...</videocollection>

Well Formed <videocollection> <title>Tootsie</title> <title>Jurassic Park</title> <title>Mission Impossible</title> </videocollection>>

Not Well Formed <title>Tootsie</title> <title>Jurassic Park</title> <title>Mission Impossible</title>

All tags must be nested properly, i.e. there must be an opening and a closing tag and the tags cannot overlap. In XML empty Tags look like this: e.g.: <BR/>. has no closing angle bracket, therefore the tag is not complete! </title)...has a wrong closing bracket, therefore the tag is not complete!
</title...

In the following example the tags are not properly nested.

Well Formed <videocollection> <title>Tootsie</title> </videocollection>

Not Well Formed <videocollection> <title>Tootsie </videocollection></title>

Tags in XML are case sensitive, that means that <CREW>, <Crew> and <crew> are not the same. The XML processing instruction must be all lowercase. But keywords in DTDs must be all UPPERCASE, such as ELEMENT, ATTLIST, #REQUIRED, #IMPLIED etc. However, your own elements and attributes may be any case you choose, as long as you are consistent.

Well Formed <crew>Sydney Pollak</crew>

Not Well Formed <CREW>Sydney Pollak</crew> <crew>Sydney Pollak</Crew>

Attribute values must always be quoted (as opposed to HTML).

Well Formed <title id="1">Tootsie</title>

Not Well Formed <title id="1>Tootsie</title> <title id=1>Tootsie</title>

Valid XML documents


Valid XML is a more rigid, or formal, form of XML. All XML Documents are well formed documents (otherwise they would not be XML documents). Some XML documents are additionally valid. Valid documents must conform not only to the syntax, but also to the DTD (Document Type Definition). DTD is a set of rules that defines what tags appear in a XML document. DTDs also describe the structure of a document.

It is advisable for an XML document to have a DTD, because if several people are authoring the document, the DTD will set out the ground rules that they can all work by, and more importantly they can use a parser (the validity checker) to make sure that they are not violating the rules. The XML DTD can either be in the prolog of the document, or it can be in a separate file that is referred to in the prolog.

Vous aimerez peut-être aussi