Vous êtes sur la page 1sur 6

Introduction to XML

I.

Historical glance:
1.
SGML: Very powerful but so complicated and expensive.
2.
HTML: Easy but not so powerful.
3.
XML: Very powerful and very easy.
It is produced by "W3C" World Wide Web Consortium at the end of nineties.
It is the abbreviation of eXtensible Markup Language.

II. Definition:
XML is a Meta markup language; it allows creating any custom markup language.
III. W3C Goals for XML:
1. XML must be straight forward and usable over the internet.
2. XML must support a wide variety of applications (software).
3. XML must be compatible with SGML (Standard Generalized Markup Language).
4. XML must be easy to write programs that process XML documents (parser).
5. XML minimize the number of optional features (example: closing tag).
6. XML documents must have good readability (clear).
7. XML design must be prepared quickly.
8. XML must be formal (small number of rules that are exactly followed).
9. XML document must be easy to create (using Notepad).
10. XML has no shortcuts (against SGML to be easier "Freaking").
IV. Some rules used in XML documents:
1.
Elements:
A tag is called an element (node):
There are several types of elements:
o Root element contains all the other elements. Each XML documents must contain at
least one element: the root.
o Child element that are contained into another element.
o Parent element that contains other child elements.
o Elements that are on the same level are called sisters nodes.
o Text elements contains only text and do not contain other elements must have tags.
2.

Closing tags:
Every tag or element must be closed.
Example: <tag name>text</tag name>.
N.B: if an element is empty i.e. has no text and no child element it can be closed directly.
Example: <tag name attribute name = "value" />.

3.

Proper nesting:
Children elements must be closed before parent elements. Examples:
<a>
<a>
<b>
<b>
</b>
</a>
</a> this example is proper nested
</b> this is not proper nesting

4.

Naming:
Names can begin with underscore "_", letter or colon ":"
Names can contain any number of letters, numbers, underscores, hyphens (dashes) "-",
dots "." or colons ":".
N.B: XML is case sensitive
i.e.: <A> this tag is different than this one <a>.

5.

Values of attributes must be within double or single quotes:


If a value contains a double quote ", we use the single quote instead.
Example: ' abcd"efgh' .

6.

Comments:
We write comments using the form <!-- comments here -- >.
It does not need closing tag.
N.B: It should not end with three dashes ---> this is false.

7.

Processing instruction: <? ?>:


It is used in order to pass parameters from XML to explorer (the calling application).
It does not need closing tag.
It is not part of an XML document.
The version declaration: <?xml version="1.0" encoding="UTF-8" standalone="no"?>
It is like a processing instruction so it is not part of the XML document.
It must be written at the first line of the file.

8.

Elements can have unlimited number of attributes. The order of attributes is not
important.<el a=ab b=cd> same as .<el b=cd a=ab>

9.

Special symbols:
There are five special symbols that use their predefined entities instead of them:
<
&lt;
>
&gt;
"
&quot;
'
&apos;
&
&amp;

10. white space:


We mean by white space the space, tab, enter and line feed. HTML and XML ignore
them by default. If we want to prevent the browser (parser) from removing the white
spaces from the content of an element we should write in the DTD:
<!ATTLIST elName xml:space (default|preserve) "preserve">
V. Checking XML document by the parser:
1. Checks if it is well formed: i.e. if it uses the 10 syntax rules properly.
2. Checks if it is valid: i.e. if the values in the XML document are valid according to the
description given in the DTD.

VI. Example of an XML document:


<? xml version="1.0"?>
<!DOCTYPE parent [
<!ELEMENT parent (child)>
<!ELEMENT child (mark, name)>
<!ELEMENT mark EMPTY>
<!ELEMENT name (lastName, firstName)>
<!ELEMENT lastName (# PCDATA)>
<!ELEMENT firstName (#PCDATA)>
<!ATTLIST mark number ID #REQUIRED
listed CDATA #FIXED "yes"
typed (natural | adopted) "natural"
]>
<parent>
<child>
<mark number= "a1" listed="yes" type="natural"></mark>
<name>
<lastName>Smith</lastName>
<firstName>john</firstName>
</name>
</child>
</parent>
The DTD in this example is internal. So the XML document does not require an external
DTD then we add to the version declaration standalone="yes"
<? xml version="1.0" standalone="yes"?>
( standalone="yes" is a default value but it is better to write it).

DTD
Note: an element can contain:
- Other elements (child elements).
- Text.
- Mixed content (Other elements and text).
- Nothing (EMPTY without parenthesis).
- Any content: not validated (ANY without parenthesis)
I.

Symbols used in DTD for elements containing other elements:


a. <!ELEMENT elName (childName)> When a child appears in elName once.
b. <!ELEMENT elName (childName?)>
When a child may appear once or not at all inside elName (0,1).
Example: <elName>
<childName/>
<elName>
Example: <elName/>
c. <!ELEMENT elName (childName+)>
When a child can appear in elName several times but at least once (1,n).
d. <!ELEMENT elName (childName*)>
When a child can appear several times in elName or not at all (0,n).
e. <!ELEMENT elName (ch1,ch2)>
<!ELEMENT elName (ch1*,ch2?>
<!ELEMENT elName (ch1*,ch2*>
<!ELEMENT elName (ch1+,ch2+>
<!ELEMENT elName (ch1,ch2?,ch3+>
f. <!ELEMENT elName (ch1?|ch2*)>
means that only one of the children listed can appear in elName.
N.B: - If we want the child to appear 3 times exactly we write it 3 times.
- (2,n) is presented by: <!ELEMENT elName (ch, ch+)>
or <!ELEMENT elName (ch,ch,ch*)>
g. <!ELEMENT elName ANY> When any thing can go inside elName.
h. <!ELEMENT elName EMPTY> When nothing can go inside elName.
i. For elements with mixed content we should use (#PCDATA|a|b|c)*
Using only | operation and with * for all content elements and #PCDATA list first

II.

Symbols used in DTD for elements containing attributes:


<!ATTLIST elName attName attDataType #IMPLIED optional
#REQUIRED obligatory
#FIXED "default" value constant
or only default value.
Notes: - #FIXED requires a fixed value "default" value.
- #IMPLIED and #REQUIRED can not be combined with a "default" value.
- Including only "default" means that the parser automatically includes the attribute
into the element.

Attribute data types:


1. CDATA:Example: <!ELEMENT shirt (ch1)>
<!ATTLIST shirt quantity CDATA #REQUIRED>
2. Enumerated: Example <!ATTLIST shirt color (red|blue|green|yellow) red>
3. ID: It has a unique value. Each element must be a valid XML name.
4. IDREF: It is used to refer to an ID attribute else where in the document.
Example: <!ELEMENT shirt (ch1)>
<!ATTLIST shirt prodCode ID #REQUIRED>
<!ATTLIST shirt quantity CDATA>
<!ELEMENT image EMPTY>
<!ATTLIST image prodRef IDREF>
In XML:<shirt prodCode="sh24" quantity="15">
<ch1>abc</ch1>
</shirt>
<image prodRef="sh24">
5. IDREFS: It is used for several ID attributes values. Example: "sh24" and "sh25".
6. NMTOKEN: The attribute value follows the naming rules, and can also be a number.
7. NMTOKENS: list of several acceptable values.
8. ENTITY
9. ENTITIES
10. NOTATION
III.

External DTD:
When an XML document requires an external DTD the version declaration in the XML
document became <? xml version="1.0" standalone="no"?>.
Declaration of the external DTD in the XML document:
1.
To call a local DTD we write:
<!DOCTYPE rootName SYSTEM "here we put the URL (address)"
2.
To call a DTD from internet we write:
<!DOCTYPE rootName PUBLIC "here the public identifier" "here the URL".
The public identifier consists of 4 sections separated by "//".
a. The first section is a "-" if it is not registered, or a "+" if it is registered and
unique.
b. The second section is the organization name.
c. The third is the document format and the document name separated by a space.
d. The last section is the required language.
Example of a public identifier and a URL:
"-//wrox//TEXT booklist//EN" "http://www.wrox.com".

Vous aimerez peut-être aussi