Vous êtes sur la page 1sur 42

XML Introduction

Author : Exforsys Inc. Published on: 14th May 2006

XML Introduction
In this tutorial you will learn about XML, History, Introduction, Uses, XML Technology.

HISTORY
In 1970 IBM Introduced SGML (Standard Generalized Markup Language). SGML was developed out of the General Markup Language (GML), which was developed by IBM in the late 1960s. SGML is a semantic and structural language for text documents but is very complicated. HTML is a subset of SGML. In 1996 XML Working Group was formed under W3C .The World Wide Web Consortium (W3C) is an international consortium where Member organizations, a full-time staff, and the public work together to develop Web standards . W3C was created by Tim Berners-Lee in 1994 who also invented the World Wide Web in 1989. In 1998 W3C introduced XML 1.0.

INTRODUCTION
XML (Extensible Markup Language) is a dialect of SGML. XML is not a programming language. Rather it is a set of rules that allow you to represent data in a structured manner. Since the rules are standard, the XML documents can be automatically generated and processed. Its use can be gauged from its name itself Markup Is a collection of Tags XML Tags Identify the content of data Extensible User-defined tags

XML is a markup language much like HTML, but used for purposes different than what HTML is used for. Its not a replacement for HTML. An XML document contains data that is tagged. XML documents are text documents. XML, like HTML uses tags and attributes (markup), but the tags in XML are used to describe data (e.g., XML TUTORIAL 1) and not for mentioning presentation formats as in HTML. The interpretation and usage of the data is left to the application/program that uses the XML document. XML was designed to describe data and is a cross-platform, software and hardware independent tool for transmitting or exchanging information. It is an open-standards-based technology which is both human and machine readable. XML are best suited for use in documents that have large amount of similarity.

XML has evolved from SGML (Standardized General Markup Language). The first version of XML (version 1.0) was announced by W3C in 1998. Version 1.1 came out in early 2004. In future Web development it is most likely that XML will be used to describe the data, while HTML will be used to format and display the same data. XML specification includes the syntax and grammar of XML documents as well as DTD.

USES
XML is widely used for the following purposes

Storing data in a structured manner. ( Tree structure)


Storing configuration information typically data in an application which is not stored in a database - Most server software have configuration files in XML formats XML documents can also be used as a mini data store. This data can be also used to present it on a variety of targets including browsers, print media, etc. Transmitting data between applications - Overcomes Problems in Client Server applications which are cross platform in nature Ex: A Windows program talking to a mainframe , Little and Big Endian problems, Data type size variations across platforms

When XML data is transferred across different systems, the data contained in an XML document can be read using a software entity called parser. Most of the popular databases (Oracle, MS SQL Server, Sybase, DB2, etc.) provide their own mechanisms to store and retrieve data as XML. Some of them also provide parsers to work with the XML documents programmatically. XML is a key technology when it comes to Web Services. .NET uses XML extensively. It is used as a data format for everything - configuration files, metadata, RPC, object serialization.

XML Technology

XML - Elements, Attributes, Entities


Author : Exforsys Inc. Published on: 14th May 2006 | Last Updated on: 9th Jun 2006

XML - Elements, Attributes, Entities


In this tutorial you will learn about Elements, Anatomy of tags, Tag naming rules, Invalid tags, Valid tags, Root and child elements, Attributes, When Do I use Attributes? Entities, Character data sections, Comments and Processing instructions.

{mos_ri} Elements
Elements It Ex: < are the basic building blocks of XML contain elements data references references Comments content > parts > Hill

may Other Character Character Entity These are collectively > Mason known Hill as < element

student

/student

An Element 1. Opening 2. 3. Closing Tag < /student >

consists of Tag < Description Mason

three student

Anatomy of tags
All elements must have a beginning and ending tag. The opening tag of an element is written between (< ) less than and ( >)greater than sign example, < student >. The ending tag is written between (< ) less than followed by a (/) forward slash and the ( >)greater than sign example, < /student >. Data between the opening and closing tags of an element are its contents. For example,

< student >Nick Price< /student > Here Nick Price is the content of the element. Most of the browsers ignore whitespaces between the tags < student > Nick Price < /student > Is < Nick < /student > same student as > Price

Note: Unlike HTML single tags(like < /br > in HTML ) in XML are not possible.

Tag naming rules


XML Names must begin with A letter, underscore(_), colon (:) and valid name characters including the preceding plus digits, hyphens (-) or full stops . The colon character should not be used, except as a namespace delimiter XML naming conventions is not limited to ASCII characters and ideographic characters could be used. It may not begin with the string xml,XML, or any match of these characters

Based on above rules examples of

Invalid tags
< .stock >< < product1 >< < product^stock >< /product^stock > / /.stock product1 > >

Valid tags
< _stock >< < product1 >< < product-stock >< /product-stock > /_stock /product1 > >

Root and child elements


The root Element is the first element in a document and it contains all other elements. In the following example student is the root element and all other elements are contained within it (name, roll-number) are child elements.

< < Bill < < 55 < <

student name /name roll-number /roll-number /student

> > Gates > > > >

In XML one cannot overlap tags. The opening and ending tags of child elements must be inside the parent element. Overlap of tags with siblings is not allowed as shown in the following example. < < < Jason < < < /student > The proper format is as follows < < Jason < < < < student name /name roll-number /roll-number /student > > > > > > student name roll-number /name /roll-number > > > > >

The root element is also called the Document element. There is only one root element . All other elements lie within the root. NOTE: A Tag could be empty i.e. contain no data like the roll-number tag in above example. Such tags are called EMPTY ELEMENTS.

Attributes

Attributes give the information about the elements. They can be specified only in the element start tag and their values are enclosed strictly in double quotation-mark. This is unlike HTML where attributes could be in single, double or without quotations. Syntax: < tag attribute = value >description < /tag > Example: < problem size=huge cause=unknown solution=run away >

If elements are the nouns of XML, then attributes are its adjectives.An Element can have zero, one or more attributes. Also an attribute name can only appear once within an element Bad: < Test name=John Good: < Test first=John last=Doe / > name=Doe / >

The million dollar question


When Do I use Attributes? Unfortunately there is no definite answer to this question. There are many contrarian views on the use of attributes. It is widely accepted belief that attributes are metadata i.e. data about data . In such scenarios use of attributes is recommended. For example , the lang attribute describing the language of the content of the element.

Entities
Entity references are placeholders for other values that are otherwise reserved in the language or that maybe misinterpreted. For example the less than (< ) and the greater than ( > ) symbols are reserved for demarking the tags. If the entity description itself contains one of these symbols the data would be misinterpreted. To avoid such a scenario Entities are used. The ampersand (&) symbol is reserved to indicate start of an entity. The various predefined entities are as follows
< > & " &apos LESS THAN GREATER THAN AMPERSAND QUOTATIONS APOSTROPHEE

. Character data sections

Character data sections contain raw data that are not parsed by XML parsers. Syntax Example: < book ISIN = INB101235647 > < author > Kacey Price < ![CDATA[ kacey has also authored Complete Reference series]] > < /author > < /book > :< ![CDATA[ raw data ]] >

Comments
Comments are enclosed in < !Comments -- > Example : < !This is start of second child element -- >

Processing instructions Processing instructions are used to pass information to applications which use this information to execute special task.
Syntax : < ? ? > Example < ?xml version=1.0 encoding= ISO-8859-1? >

NOTE: Here the version attribute specifies the version of XML being used while encoding gives the encoding format for parsers. A xml document displayed in IE 5.0 or above.

XML - Document Type Definitions (DTD)


Author : Exforsys Inc. Published on: 14th Jun 2006 | Last Updated on: 16th Jan 2009

XML - Document Type Definitions (DTD)


In this totuorial you will learn about XML - Document Type Definitions (DTD) - Need, DTD, Types Of DTDs , Internal DTD , External DTD.

NEED
XML documents can contain many different types of markups including elements, attributes and entity references. Whatever maybe the application it is desirable that the XML document conforms to a certain set of rules governing the data structure it contains. DTD and Schemas are used for this purpose. For Example, < name >12233< /name > If a DTD defines that data in name tags should contain only characters and if it contains numbers , as shown above, the document is invalidated by the XML parser using the Document Type Definition (DTD) as reference.

DTD
DTD stands for Document Type Definitions. It describes syntax that explains which elements may appear in the XML document and what are the element contents and attributes.

A valid XML document must include the reference to DTD which validates it. When a DTD is absent the validating parser cant verify the data format but can attempt to interpret the data.

TYPES OF DTD
Internal DTD: DTD can be External DTD: DTD can be in a separate file embedded into XML document

INTERNAL DTD
Internal DTD are embedded in the XML document itself. They are convenient when constraints are applied to a single document. They are also used while designing a complex DTD for testing a sample document. Also, modifications becomes relatively simpler since the DTD and markup are in the same document. Syntax : < ! DOCTYPE root_name[assignments] >

It begins with the DOCTYPE keyword (after < less than and ! exclamation mark) followed by the name of the root element. The root is followed by a square bracket which signifies beginning of declaration assignments. The last entry is a less than symbol ( >). In the assignments section elements are declared as follows < !ELEMENT child_name(child_name or data type) > Detailed E.g. < < < < < < < < ] < < < version='1.0' encoding='utf-8'? DTD for a AddressBook.xml -!DOCTYPE AddressBook !ELEMENT AddressBook (Address+) !ELEMENT Address (Name, Street, City) !ELEMENT Name (#PCDATA) !ELEMENT Street (#PCDATA) !ELEMENT City (#PCDATA) AddressBook Address >Jeniffer< ?xml !-> > [ > > > > > > > > > explanation on this is covered in the next tutorial.

Name

/Name

< < < <

Street City

>Wall >New

Street York< /Address /AddressBook

<

/Street /City

> > > >

Above when viewed in IE 5.0 or above

Here the order of the declarations is not important . Thus, < < < < < < < < ] version='1.0' encoding='utf-8'? DTD for a AddressBook.xml -!DOCTYPE AddressBook !ELEMENT AddressBook (Address+) !ELEMENT Address (Name, Street, City) !ELEMENT Name (#PCDATA) !ELEMENT Street (#PCDATA) !ELEMENT City (#PCDATA) ?xml !-> > [ > > > > > >

is < < < < < < < < ] > ?xml !--

same version='1.0' encoding='utf-8'? DTD for a AddressBook.xml -!DOCTYPE AddressBook !ELEMENT AddressBook (Address+) !ELEMENT City (#PCDATA) !ELEMENT Name (#PCDATA) !ELEMENT Address (Name, Street, City) !ELEMENT Street (#PCDATA)

as > > [ > > > > >

EXTERNAL DTD
DTD is present in separate file and a reference is placed to its location in the document. External DTDs are easy to apply to multiple documents. In case, a modification is to be made in future , it could be done in just one file and the onerous task of doing it for all the documents is omitted. External DTDs are of two Types 1) Public

These are standardized DTDs and give publicly available set of rules for writing XML Documen 2) These are created by private organizations. NonPublic

Adding Public DTD To Documents


We consider an example to understand the process of adding public DTD to documents < !DOCTYPE spec PUBLIC -//W3C//DTD /XML/2005/01/xml v20.dtd >

Specification

V2.0//EN

Here keyword PUBLIC specifies its a publicly available DTD. . -/+(Minus/Plus) sign implies that DTD isnt / is a recognized standard. . // used for separating category of information. .

W3C Owner Of DTD . DTD Specification V2.0 - States the label for DTD . EN Abbreviation for ISO 639-1 encoding . /XML/2005/01/xml v20.dtd - URL where DTD is stored.

ADDING NON- PUBLIC DTD TO DOCUMENTS


The < syntax for adding nonpublic DTD SYSTEM to documents file is as follows >

!DOCTYPE

root_name

path

In the DOCTYPE declaration after the root_name specify SYSTEM to indicate that its a non-public DTD followed by file path where the DTD is stored which could be a URL or a file path.

Example The DTD for AddressBook.xml is contained in a file AddressBook.dtd AddressBook.xml contains only XML Data with a reference to the DTD file < ?xml version="1.0" encoding="UTF-8"? > < !DOCTYPE AddressBook SYSTEM "file:/// c:/XML/AddressBook.dtd dtd" > " > < AddressBook > < Address > < Name >Jeniffer< /Name > < Street >Wall Street< /Street > < City >New York< /City > < /Address > < /AddressBook >

XML - Elements in Document Type Definitions (DTD)


Page 1 of 3
Author : Exforsys Inc. Published on: 14th Jun 2006

XML - Elements in Document Type Definitions (DTD)

In this tutorial you will learn about Elements in DTD, Elements, child elements (nested elements), declaring elements with character data only, declaring elements with mixed content, declaring elements with any content, declaring elements with no content and element order indicators and qualifiers. Elements in DTD.

ELEMENTS
Every element used in the valid XML document must be declared in the Documents DTD. SYNTAX : < !ELEMENT element_name content_specification > element_name: Specifies name of the XML tag Content_specification: Specifies the contents of the element which could of the following five types I) II) III) IV) V) No Content Standard Only Mixed AnyType of Character Content Data Content Content

CHILD ELEMENTS (NESTED ELEMENTS)


Most Element declarations define one or more child elements. For Example, < !ELEMENT customer (customer_name) > Here , Element customer contains one and only one nested element i.e. customer_name < !ELEMENT Address (Name, Street, City) > Here, element Address contains three nested elements Name, Street and City respectively SYNTAX: < !ELEMENT parent (child) > OR < !ELEMENT parent (child1,child2, . . . , childN) >

DECLARING ELEMENTS WITH CHARACTER DATA ONLY


Top level elements generally contain other elements but low-level elements may contain parsed character data. In XML, #PCDATA is the keyword to declare elements with parsed

character Can Can Cannot

data. contain

An element contain Entities contain

declared character such other as

as

#PCDATA data <, > elements

SYNTAX < !ELEMENT element_name (#PCDATA) > Example: < !ELEMENT Street (#PCDATA) > Element Street contains the parsed character data #CDATA is another keyword to declare character data. But unlike #PCDATA, whitespaces are retained as it is in #CDATA.

SYNTAX <

!ELEMENT

element_name

(#CDATA)

>

Example: < !ELEMENT City (#CDATA) > Here, DTD declares the City element to contain character data. In XML, document < City > London < /City > The XML, parse will take take the data as London and not as London as in the case of #PCDATA

Author : Exforsys Inc.

Published on: 14th Jun 2006

XML - Elements in Document Type Definitions (DTD) . .

DECLARING ELEMENTS WITH MIXED CONTENT


At times it is required to declare elements with mixed content i.e. both data and other elements. In such situations the pipe symbol (|) is used. SYNTAX < !ELEMENT parent (#CDATA or #PCDATA,child1,child2, . . . ,

childN) Example: < This < This < < /bank > bank account account account account >423578< >123456< is /account is /account

>

> Active > Closed >

DECLARING ELEMENTS WITH ANY CONTENT


In Real world scenarios, the developer is many a times not sure about the exact document structure while creating the DTD. At such times, ANY keyword comes handy. An element declared as ANY can
Contain child elements Contain character data Contain mixed content

SYNTAX: < !ELEMENT element_name ANY >

DECLARING ELEMENTS WITH NO CONTENT


Sometimes it is required that an elements has only attributes but no data. In such scenarios the EMPTY keyword is used. SYNTAX : < !ELEMENT element_name EMPTY >

ELEMENT ORDER INDICATORS AND QUALIFIERS


The various order and qualification governing symbols are listed in the table append below
ORDER QUALIFIER

TYPE
| () ,

VALUE Choice Group Sequence

CONTEXT Either one child element or another can occur Groups related elements together Element must follow another element

DESCRIPTION

? * +

Optional Optional and Repeatable

Elements appear once or not at all Elements appear zero or more times

Required and Repeatable Elements appear one or more times

EXAMPLES: The pipe symbol (|) specifies choice. So occurrence of either of the chiold element is considered valid by the parser. Following declaration specifies that name must contain either first_name or last_name < !ELEMENT name (fist_name | last_name) >

Thus, < < < as < < < /name >

first_name

name >Nick< /name well name >Price<

/first_name

> > > as > >

last_name

/last_name

XML Advantages
Author : Exforsys Inc. Published on: 5th Jul 2007

XML Advantages
There are many advantages to using XML for information exchange, and they offer many benefits to the user. The Extensive Markup Language uses human language, which is conversable and not the language used by computers which is binary and ASCII coded. XML is readable by even people who have had no formal introduction to XML or have been coached on it.
It is as easy as HTML. XML is fully compatible with applications like JAVA, and it can be combined with any application which is capable of processing XML irrespective of the platform it is being used on.

XML is an extremely portable language to the extent that it can be used on large networks with multiple platforms like the internet, and it can be used on handhelds or palmtops or PDAs. XML is an extendable language, meaning that you can create your own tags, or use the tags which have already been created.

There are other advantages of using XML.


It is a platform independent language. It can be deployed on any network if it is amicable for usage with the application in use. If the application can work along with XML, then XML can work on any platform and has no boundaries. It is also vendor independent and system independent. While data is being exchanged using XML, there will be no loss of data even between systems that use totally different formats.

From a programmers point of view, there are a lot of parsers available like the API, C and many more. If your data is very rich, then using XML to capture the data makes a lot of sense mainly because it is in plain text and in a language that humans can read. XML also gives the freedom to define your own tags that fit your application needs. XML can also be stored in databases in XML format and human readable format. The advantages of XML include that it can be used as an instrument to share data and application models in wide networks like internet.

The Advantages of XML in Java


XML uses the Common Component Architecture or CCA, and the Common Object Request Broker Architecture, or CORBA. In other words, this means that it uses a common and standard protocol which helps interoperability for programs. It also allows RMI, or remote method invocation in Java and invokes another java object. It also allows the clients to connect to the program using the remote procedure calling, or RPC in short.

XML Advantages with Tags


The very first advantage with using tags in XML is that XML allows you to create your own tags, and you are not limited to a standard set of tags that have to be used to program which are predetermined by program vendors. If you use vendor declared tags in any program, there is a limitation where the browsers and other programs associated with it will have to approve it first and then get accustomed to usage, which will be quite a time consuming process. But in XML you are creating your own tags and are already accepted by other languages so this becomes a time saving process.

You can also have the freedom to develop at your own pace and moreover develop tools that will be helpful for your programming needs without a lot of investment of time or money. Here by defining your own tags you are widening your horizons. You can make the tags work for you and develop anything the way you want it, compared to vendor declared tags where you will have to fit your programming needs to suit the tags, which is a big limitation to creativity in programming.

Advantages of XML in Format


For example, escaping the tag limitations of HTML using XML is not even close to getting to one of its best advantages. There is a lot more to it than just being able to give your own tags in simple English. In HTML there are common problems that a user comes across while marking up the data. However, there are three major problems that a user would come across and they are; The Graphic User Interface, commonly known as the GUI is embedded with the data itself. The disadvantage is that if you decide on some other format rather than the specified format in the GUI, you will have to encode your entire HTML which means you might actually end up editing volumes of data and lots of pages. The second disadvantage the user might come across is it could be a very tedious process to find information by navigating the data. Power searching is extremely difficult in HTML, and the correlation of field does not exist. Only when the data is structured can you find similar data or correlated data. The third disadvantage is that the data uses the same logic and language that the HTML uses. If the user has the need to present the same data in some other format, like say for example in a Java applet, he will have to program that data using that individual application, or the Java applet has to parse the HTML document and string out information and reformat the whole data. XML has advantages in all the above mentioned areas. It overcomes all the limitations that HTML has whether it is with the language or the presentation of data or the structure. XML is already known to be highly structured data which solves the problem for users when it comes to correlation or identifying similar fields. In XML the GUI can easily be extracted, and changes can be made without disturbing the existing data in any way. If you want a table format you need to just create another style sheet with the table format ,and you don't have to erase the current data which is present in

the file. You can still retain the list format if you need to and also have the table format simultaneously. Searching the data is all the more easy in XML document because any search engine can easily parse through the data using the tags and locate the required data. It offers a freeway to navigate through data. The XML data is structured and tree shaped depending on the way it has been formatted. Even complex relationships in the tree structure, and the parent child relationships in a directory because it is clear in its format. The codes in XML are easily legible to a first timer, and also because it is all written in simple plain text and in a human readable language.

XML from its Early Stages


XML has come a long way since it emerged, and it has been constantly improved and is still undergoing a lot of changes as the internet technology is growing. But one factor which puts XML in an advantageous position is that it is through the changes that it has been flexible, and has been able to fit all the growing needs and functions as needed. It has its edge when it comes to this in comparison to HTML or SGML, which have their own shortcomings. Even though SGML has been used since the 1980s and is considered to be very powerful, it could not cope with the fast changing technologies. And also as time moved forward SGML was proving to be more expensive for its hunger for processor time and need of bigger processors. There was a desperate need for a language that was low in cost and maintenance, but would not compromise on efficiency and high level programming. HTML was some sort of an answer for SGML's in capabilities, but not in a very satisfactory way. HTML was hands free and simple. It also had a wide acceptance. But it had defects, and it was not flexible, and changes within the programmed document was almost impossible without disturbing the data in anyway. XML provided the ultimate answer to all these capabilities. Over the years XML has evolved into a completely functional tool which has made it easy for the internet in more than one way, and users benefit from it in thousands of aspects.

Business Applications and Advantages of XML


XML's efficiency and accuracy when it comes to data has a proven track record. Airlines such as Jet blue depend on XML for updating flight manuals and even flight plans because of its reliability. Business documents are usually safe, not in one location but various locations. XML can be used to integrate this data. Companies like General Motors are finding solutions in XML for their problems with data in distributed environments.

Many companies are now depending on the web services which can provide them solutions for a centralized environment for their data which needs to be safe and secure. Many applications come with a legacy and heavy price tag to pay. XML is the simplest solution a user might have imagined in solving all these complex issues.

XML Disadvantages
Author : Exforsys Inc. Published on: 8th Jul 2007

XML Disadvantages
The extensive markup language is the way to go for developing future web applications, and it almost defines the future of web development. There are no doubts about its performance in this arena. However, XML also has some draw backs which need to be looked at and improved upon. The reason it faces some resistance from users is a result of these drawbacks. One of the biggest drawbacks of XML is that it is lacking in the area of adequate applications for processing.

Lack of Applications Processing


XML needs an application processing system . There are no browsers yet which can read XML. In the case of HTML, anyone can write up a program and that can be read using any browser anywhere in the world. To be able to be read in a browser, XML still depends on HTML, and is not independent of it. The XML documents have to be converted to HTML before they are deployed. The most common method is to write the parsing routes in either DHTML or Java applications and parse through the XML document. The formatting rules can be applied by the style sheet to convert the entire document into HTML. Other disadvantages of XML include the fact that it is more difficult, more demanding and more precise when compared to the HTML. XML does not have any browser support and does not have anything to support the end user applications. It is still in the early experimental stage and hasn't proved its mettle yet. XML as a language is very flexible, but its flexibility can potentially become one of its disadvantages, since there may be disagreement in its tags. If an XML object has too many constraints, it might get very difficult to construct the file. It is too verbal and sometimes this may be a problem for other applications. While just describing tags and building a system sounds very easy, it may not be that easy in reality. For example, a business or professional organization may have hundreds of functions related with one set of documents. XML does not have the capability to synthesize all the information related to the document.

General Weaknesses of XML

Since XML is a verbose language, it is totally dependant on who is writing it. A verbose language may pose problems for other users. XML is not specific to any platform, and has a neutral platform requirement which may be a disadvantage in a few circumstances. All the standards of XML are not yet fully compliant. They are not fully recognized to be used yet. Users have reported problems with the parser and there are problems with XML and HTTP which are still being resolved.

Disadvantages for XML Documents


XML documents can be difficult and also expensive to setup. A freelancer for example can sit at his home and at his own pace create, write and format a document or a manuscript using any of the free software available. However, the moment he introduces XML or starts with it, the whole process could get really painful to take the document further.

XML and Unicode Disadvantages


While implementing programs which are multiple in number are incompatible, when XML is split between these it may really turn out into a bad thing. When XML is tied closely to Unicode, the Unicode changes XML's attributes which might result in something which is totally different from the original. The XML parsers when used along with the RSS and the component called next cannot disable the external entities. Instead they recognize them as their own which can prove to be a major disadvantage. XML by itself cannot work along with Netscape, which makes it dependent on HTML. In certain instances like this, XML does have disadvantages and cannot be called a super efficient model which is platform independent, and that can be deployed on any operating system. The limitation here is also very basic since it cannot talk to the browsers. There are a lot of sample codes which belong to HTML and XHTML which contain a doctype, and which points it to a DTD. The common belief is that this actually works, but what very few people don't realize is that the browsers do not actually retrieve these DTD's. Whenever the DTD is unavailable for a various number of reasons then the entire application breaks down. This is such a bad phenomenon because the DTD can be unavailable for reasons beyond comprehension, and in some cases and it doesn't mean that the service itself has to become unavailable. XML creates an abundant amount of dependency on single factors which can let down the program many times. DTD when available is totally not useful, and an outside program has to be burdened to create a back up system, so users and developers might as well use the outside program from scratch, which has the back up at intermediary levels.

External entities again pose a perennial problem, which again is a major disadvantage for XML. The best way to fix the external entities problems with XML DTD is to not to use them at all, or if you have to use them, then don't use them on the producer side, and moreover do not attempt to retrieve them on the client's side. In case you are writing the specifications for an XML document, do not even mention the specifications for DTD in the vocabulary, and there is also a need for the programs to run their parsers for XML by disabling the external entity resolution. Otherwise the external entities problem will invariably crop up, triggering a series of problems which cannot be solved by the XML environment alone. While layering the specifications it can be considered against the rules to disable or ban certain document types, which is allowed in SOAP. If your job is to implement the Web application which is based on XML, you may need to configure the parser not to perform the DTD based validations, and also not to try and resolve the external entities. This could be an answer to some of the future problems, so taking precautionary measures is worthwhile. Publishing documents on the web requires the same precautions to be taken by not including the document types. The document may not be valid like the way XML describes it to be, and some people even believe that the document validation in XML is overrated. Document data types are not known to be very powerful when it comes to validation and it has been forgotten that the document has its own language and grammar which can again not be efficient while getting validated. There is also the problem of other programs not trusting the XML DTD. The doctype in HTML is much different from the doctype in XML. So you may not be able to use the doctype in XML as an indicator, which helps programs understand what type of document it is dealing with. If there is an application which exists that can handle multiple vocabularies of XML, and also knows to dispatch the respective documents to the concerned handlers by checking the namespace at the root of the element, then you can consider yourself lucky. If the vocabularies are not mentioned in the namespace then you can look for them in the mime type. In some cases the Vocabularies are not present in the name space, nor are they specific to the mime, and then such language is certainly a bad example and will create a lot of problems because you will have to use the root element name. The XML specifications define three kinds of files processing. The first one is DTD based validations which do not perform or retrieve external entities, and the second one is the DTD based validation which do not perform or retrieve external entities so that the infoset and the reference library can be expanded. The third one is to perform the DTD based

validation by retrieving the external entities so that the infoset and the entity reference can be expanded. The point of having many profiles is so that the application has a choice and it chooses the right one. Character entities are considered unsafe for web applications. It is a disadvantage because there will be a problem with the input and its editor. On the World Wide Web there may be other options available when there is such a problem. The situation need not be so unfortunate because there may be a solution which exists, and there in fact is an input method which can solve the problem with the editor. If the XHTML entities were pre defined then there wouldn't have been much of these problems. But that is going back in time, and it cannot be changed. As discussed earlier, sometimes for XML its flexibility could turn out to be its biggest disadvantage.

XML Web Services


Author : Exforsys Inc. Published on: 11th Jul 2007

XML Web Services


One of the main reasons the web succeeded today is it simplicity due to which it can be omnipresent. Web services are the new generation we applications. They are self starting applications that run on their own platforms and are accessible on the web easily. Web services architecture is very simple to understand and is divided into tiers.

What is a web service platform?


A basic web service platform is XML or Extensible Markup Language. XML or the Extensible Markup language is a meta-language which allows you to compose languages in it to develop an interface between the client and the service. This technology works with the help of a middle ware; the XML converts the data to a middleware request and then converts it back to XML.

What are the various platforms?


The various web services with web platforms are XML plus HTTP, XML plus Soap, XML plus UDDL, XAML, XMS, XLANG and the XKMS respectively. SOAP: SOAP or the Simple Object Access Protocol is used mainly to exchange web based messages between computer networks. It is a protocol that performs Remote Procedure Calls (RPC). SOAP is being further developed by the W#C workgroup which maintain the group of protocols that work with the Extensible Markup Language. SO presently SOAP is not being used and is kept at bay for further research.

UDDI: UDDI or the Universal Description, Discovery and Integration Service are a dynamic protocol used along with the Extensible markup language to find other web services on the internet. UDDI's functionalities are very similar to the CORBA and also act as a Domain Name Server for service for various business applications. The UDDI is dependant on the SOAP in a way that the UDDI sends requests disguised as SOAP messages. It is still not accepted as a standard protocol for XML because of its limitations in terms of dependency on SOAP which itself is under scrutiny and undergoing changes. XLANG: XLANG is an extension of WSDL or the Web Services Description Language. WSDL is a XML based service which helps the communication between web services. XLANG service is also used to undo some complex operations. In fact the main usage of this protocol is to undo the operations which are very important form the commercial aspect of these applications. XAML: Transaction Authority Markup Language acts as a compensatory language for the XLANG. XAML is also a service that is used to undo operations but in this case the XAML does not restrict only to two phased applications like between a buying and paying transaction, or a selling and receiving transaction. It leaves other options open for a two way transaction to be undone. XKMS: XML Key Management Specification is mainly use to create digital certificates or signatures with XML applications. XKMS is further divided into two services the XKISS and the X-KRSS. The XKMS protocol depends on the XML, WSDL and the SOAP largely.

XML standards
Since the beginning XML or the Extensible Markup Language has been growing constantly different standards are being asset and different technologies have been evolving. For XML users it may be extremely difficult to keep up with the ever changing spaces and new entries. The word standard has to be redefined when it comes to XML because there are so many standards of usage already and more are being added. However there are some core standards in XML which can be considered as a dictionary of fixed terms. These terms form the basis of what is expressed in the Extensible Markup Language.
Canonical XML or the C14n

Canonical XML allows the creation of XML documents in XML syntax without changing the meaning of syntax or causing any syntax errors and creates a representation of the XML document physically. This is a standard method of creating a physical representation of an XML document.
XML Catalogs

XML processors find information on how to resolve a URL in a XML catalog. It also has the capability of substituting one resource with the other. Catalog processing is an integral part of the XML parsing.
XML information set or the Infoset

XML information set enables to list an XML document in as series of objects or in a series of descriptions that have specialized properties. This series also provides information on the XML document.
XML Name spaces

XML namespaces enables the users to provide universal names attributes for elements in an XML document. For example namespaces like head and body can be used which are otherwise used to describe anatomy of a human body.
RELAX NG

RELAX NG is a kind of language editor which can be used to describe, define and also provide limitations for XML language. It is a grammar based schema language. Schema means something which can be used to limit and define terms in language.
Schematron

Schematron is also a schema language but it is a rules based language. It just creates rule and not limitations. These rules define and limit the XML language.
DSDL or Document Schema Definition Languages

Document Schema Definition Languages or the DSDL provides a framework for the validation and core processing of Extensible Markup Language. It contains individual specifications ether in small groups or in experts and they are all well defined. The DSDL framework of specifications can be used separately or collectively for XML validation.
Uniform Resource Identifiers (URL) and International Resource Identifiers (IRI)

The Uniform Resource Identifiers is a tool used to identify resources that are of HTTP, XML, and Multimedia in nature. International Resource Identifiers are tools used to locate URL's, XML documents, http documents from the international resources on the internet.
W3C XML Schema

W3C XML schema is one of the schema languages to define and limit the XML language. It also forms the foundation for a few standards in XML message or data binding.
XML Inclusions or Include

XML Inclusions or XInclude has the capability of including or merging all XML documents and also ahs added features. One large document can be merged with smaller ones.
XML Linking Language or XLink

XML Linking language or XLink is a framework which enables a facility to create links in a XML document. It is used to create simple links which are essential for XML documents.
XML Base

XML base or the Extensive Markup Language Base is the tools which enables the merging of ML elements with the URI's or the Universal resource Identifiers and the IRI"S or the International Resource Identifiers. It provides a platform where these both the XML documents and the URI's and IRI's can associate with each other.
XML ID

XML ID provides an environment for expressing the unique identifiers and attributes which are used to identify the elements of the XML document.
XML or Extensible Markup Language

The XML or the Extensible Markup Language is a derivation of the SGML or the Standard Generalized Markup Language. While the SGML was a very rigid format the XML is a much more relaxed environment to work with.
XML Path Language or the XPath

XPath is considered to be the most successful of all the XML technologies today. It forms syntax or a data model to identify different parts of XML document.
X Pointer Framework

The XPointer Framework refers to the fragments and their locations in a XML document. It brings similar URL's which use hashes to point a particular link of a HTML document together. Apart from the standards for XML documents there are some XML processing standards like the Cascading Style sheets, Document Object model, Remote Events for XML, Simple API for XML, State Chart XML, SOAP, SQL with XML extensions, XML Binding Language, XForms, XML Processing Model and the Extensible stylesheet Language Transformations (XSLT). Some of the key XML vocabularies are
Atom Syndication Format, Darwin Information Typing Architecture (DITA), DocBook, Mathematical Markup

Language (MathML), Open Document Format for Office Applications (Open Document), Resource Description Framework (RDF), Synchronized Multimedia Integration Language (SMIL), Scalable Vector Graphics (SVG), Voice Extensible Markup Language (VoiceXML), XML Bookmark Exchange Language (XBEL), XHTML, XQuery 1.0: An XML Query Language, Extensible Stylesheet Language Formatting Objects (XSL-FO), XUpdate

Several organizations have been involved in creating standards for an XML document for the XML users like the World Wide Web consortium which is also commonly referred to as W3C an they usually issue recommendations rather than standards. Another of these is the International Organization for Standardization which probably leads the others and the most of active of all. Organization for the Advancement of Structured Information Standards (OASIS) has its own standards and has been approved and recommended by the Oasis team. The last is the Internet Engineering task Force is an organization which thrives on public opinion gathered from collecting reviews over the Internet. They collect Internet drafts and RFC's or Request for Comment, almost anyone with a computer and Internet can submit the RCF or the Internet Draft and voice their opinions. The XML community has gained tremendous mileage in the past for its activities in spite of its varying standards and shortcomings it has remained a huge success.

XML Parsing
Author : Exforsys Inc. Published on: 14th Jul 2007 | Last Updated on: 15th Jul 2007

XML Parsing
XML documents can be parsed efficiently and more critically because XML is a widely accepted language. It is extremely crucial to programming for the web that XML data be parsed efficiently, especially in cases a where the applications that are required to handle huge volumes of data. When parsing is improper it can increase memory usage and time for processing which directly affects the scalability by decreasing it. There are many XML parsers that are available. Choosing a right one for your situation might be challenging. There are three XML parsing techniques which are extremely popular and are used for Java and it also guides you to choose the correct make right choice of method based on the application and its requirements. An Extensive Markup Language parser takes a serialized string which is raw as input and performs a series of operations with it. First and foremost the XML data is checked for syntax errors and how well it formed is, and it also makes sure that the start tags will have end tags that match and that there are no elements which are overlapping with each other.

Many parsers implement first validate the Document Type Definition (DTD) or even the XML Schema sometimes to verify if the structure along with the content are correctly specified by you. In the end the output after parsing is provided access to the XML document's content through the APIs programming modules. The three XML parsing that are popularly used with techniques for Java is, Document Object Model (DOM), it is w3c provided mature standard, and Simple API for XML (SAX), it was one of the first to be widely adapted form of API for XML in Java and has become the standard, the third one is Streaming API for XML (StAX), which is a new model for parsing in XML but is very efficient and has a promising future. Each one of the mentioned techniques has their advantages and disadvantages.

Parsing with DOM


Data Object Model or the DOM technique that based on the tree structure parsing and it builds an entire parsing tree in the memory. It also lets the DOM have complete access to the entire XML document dynamically. The data object model is a tree like structure. So the document is considered to be the root from which all the DOM trees take birth, and the root will have one child node at the least, and the root element, which usually catalogues elements keeps it in the sample code. Another node that is created is the Document Type, which is used for the Document Type Data declarations. The elements in the catalog usually have child nodes, and these Child nodes are used as elements. The DOM program takes the XML filename, and then creates the DOM tree. It uses the function called getElementsByTagName() for finding all the Data Object Model element nodes that can be used as the title elements. After this it finally prints the information in the text that is associated with the title elements. It achieves this by inspecting the list of title elements and then it examines the first child separately. The first child element is usually located between the start and end tags of the element, and it also uses the function getFirstChild() method to achieve this. The Data object model is a direct model and very straight forward in its functions. XML document can be accessed randomly at any time because the memory stores the entire tree. DOM APIs also modify the nodes like for example appending a child or restructuring and updating or removing or deleting a node. There is a lot of support for navigating the memory tree in the DOM; but simultaneously there are issues related to parsing that have to be considered. It is essential in this system that the entire document has to be parsed at one single shot and the same time, it cannot be parsed partially or in intervals. If the XML document is huge then building the entire tree in the memory will become an extensive and

an expensive process. The Data object model tree can actually consume a lot of memory. Though the DOM is very interoperable and interoperability is the biggest positive point it can offer at the same time it is not very good with binding and this proves to be its draw back when it comes to object binding. There are a lot of applications which are well suited for DOM parsing. If the application needs to have immediate access to the XML document randomly then in such cases the DOM parsing is appropriate. For example an Extensive Style Language processor always has the need to navigate through an entire file and this becomes a repeated process while it is processing templates. Dom is dynamic when it comes to updating or modifying data so this feature is extremely convenient for applications, like the XML editors, which need to frequently modify data.

Parsing with SAX


SAX processing model is entirely based on stream of events and is an event-driven model for the processing of XML documents. Though it is not a standard declared by the W3C, it is still a very famous form of API that many SAX parsers use in without offending compliance or crating issues related to compliance. Unlike the DOM where it builds an entire tree to represent the data, the SAX parser streams a series of events while it reads the document. These events are forwarded to event handlers, which also provide access to the data of the document. There are three basic types of event handlers the DTD Handler which is used for accessing the data of XML DTD's. The error handlers which are used for creating a low-level access to the errors created while parsing. The last but not the least Content handler which is used for accessing the content in the document The difference between the DOM and the SAX parser offers a great benefit in terms of performance. It provides a low-level access which is efficient at the same time to the XML documents contents. Whereas the SAX model while having the major advantage of consuming extremely low memory, mainly because the document in its entirety does not have the need to be loaded into the memory slot at one time, and this feature enables a SAX parser to be able to parse a document which is much larger than the system's own memory component. In addition to this, you don't have the need to create objects for each and every node, unlike the DOM environment. SAX "push" model finally can be used in a broad context, when it comes to multiple content handlers which can be registered and used to receive events in a parallel way, instead of receiving them one by one in a pipeline in a series. One of the disadvantages of SAX can be that you will have to implement all the event handlers to handle each and every incoming event. The application code must be maintained in this state of events. The SAX parser is incapable of processing the events

when it comes to the DOM's element supports, and you also have to keep track of the parsers position in the document hierarchy. The application logic gets tougher as the document gets complicated and bigger. It may not be required that the entire document be loaded but a SAX parser still requires to parse the whole document, similar to the DOM. One of the biggest problems the SAX is facing today is that it lacks a built-in document support for navigation like the one which is provided by XPath. Along with the existing problem the one-pass parsing syndrome also limits the random access support. These kinds of limitations also start affecting the namespaces. These shortcomings make SAX a not so good choice when it comes to manipulating and even modifying a XML document. Applications that can read the documents content in one single pass can derive huge benefits from SAX parsing. Many Business to Business Portals and applications use XML so that the data can be encapsulated in a format in which it can be received and retrieved using a simple process. This is the only scenario where the SAX might win hands down compared to DOM, purely due to the efficiency of SAX which results in high output. The modern SAX 2.0 also has a built-in filtering mechanism which makes very easy for the documents output to be subset. SAX parsing is also considered very useful when it comes to validating DTDs and the XML schemas.

Parsing with STax


Stax is a brand new parsing technique which is very similar to SAX and also an improvisation to it. The STAX uses a model that is event-driven. The only difference between sax and STAAX here is that the sax uses a push model and the STAX uses a pull model for event processing. And also another notable feature is instead of using call back options the STAX parser returns events which are requested by the applications in use.

XML Processing
Author : Exforsys Inc. Published on: 16th Jul 2007

XML Processing
XML documents process is explained by a huge set of specifications and the list of these specifications is growing endlessly. A lot of applications depend on these specifications to work with XML or extensive markup language. These specifications will have all the requirements listed for XML processing model and even the XML language specifications. These specifications are more at the conceptual level and contain descriptions about the language based interactions.

The XML documents are treated as a set of information modules and the specifications contains processes which construct new sets of information modules, inspect the information sets, modify them or extract information from the per existing information sets. The processing model has to be described in terms of the info set and the applications which have been working with the solid object models cannot be considered as the info set. The applications use DOM object models or the SAXX event stream or other representations of the info sets.

Requirements of the XML processing model


The language should be able to address the concerns related to interoperability. The language itself should be easily operated and should be simple for the XML processing model. The language should be able to specify the input and output and all the required paramet6ers of the document. The language should define mandatory processing options for input and also error reporting options in the XML processing model. This has to be done for the sake of interoperability. The language should be capable of specifying the documents and the set of and components separately. The language itself should be easy for implementation but it should be also be sophisticated for performing operations that can be optimized. The XML processing model should be extensible so that the applications have the ability to define new functions and design them min the pipeline. The model should have a plan for error handling and fallback scenarios. The XML processing model should be able to select different components depending on the run time and should also allow processing which is conditional to take place. The information exchange between the components should take place in a standardized way. The language should be able to use the XML tools for manipulating the data and so the data should be essentially in XML.

Processing XML with Java


XML document is a tree of objects and there are standard API's which are used to represent them using the World Wide Web's data object model specifications. It is represented as a series of events in the SAX. The standard API for the Java XML parsers is called the JAXP and the JAXP 1.1 is expanded to include an API for the engines in XSLT also. This phenomenon is called TRAX which is a standard or Transformation of API for XMLAPI is very powerful if you understand its usage and the top level interfaces of the TRAX.

Uses of TRaX
The XML transformation is included in the TRAX API and the original work of the JAXP is extended to bring in a vendor and a standard Java API for identifying and carry out the XML transformations. TraX plays a more important role in this environment that just being

an API engine and its main usage is for being a general-purpose interface for transformation of XML documents. TRaX is not in competition with the data object model or the java data object model or even the SAX, it is just an API which is used to represent the XML transformation methods and bridge these various methods. It includes SAX events and templates from XSLT. TRax also relies upon SAX2 and the Data object model or the DOM and their parsers to a great extent. TRaX basically provides the same level of functionalities like the XSLT engines but the parsers can be changed by changing their properties. In certain codes for a successive transformation the XSLT code has to be reprocessed. A common scenario is that the same set of transformations is used to apply to different sources repeatedly but possible in different series of threads. A better way to approach this whole thing would be to process the style sheet transformation only once and keep this as a copy by saving it for the other repetitive transformation cycles. This way a lot of time can be saved and the process need not be repeated over and over again. By using the TraX interface and its templates this can be done. When the transformation is taking place with the help of the transformer the actual instance for the template would be the real run time processing that takes place during the transformation and the instructions that go into it. If you would like to increase output and performance levels then these templates instances can be saved and used and also these templates are thread safe. The very fact that a XSLT style sheet contains a huge collection of templates of one or more elements leads to interfaces which end up with plural names. Each style sheet transformation is defined by a template element within the same style sheet and therefore it chooses the simplest name available for the template for representing the collection of templates

XML Processing in Python


SAX or the Simple API for XML and DOM the data object model are two popular and basic ways which create an environment to work with XML. SAX method carries out its functions by reading the XML in divisions, some at a time and whenever it finds an element it calls for it. This is somewhat similar to the HTTP which works in a similar fashion by calling out elements as and when it finds it in the document. The Data Object Model reads the entire document first and then it creates references through out the document using the Python classes and links all these references it has been collecting into a tree shaped structure. But the draw back is if the XML document is huge it is going to end up spending a lot of time scanning the entire document a creating references and also it is going to take a lot of memory space to store that tree shaped structure which it will create at the end of it all. Python has its own standard modules for parsing the XML document.

Parsing XML using DOM level 2

The data object model basically represents the entire data in an XML document in a tree shaped structure like format. This tree shaped structure format can be easily manipulated by Java because as it is DOM has it that it is very simple for other programs to use as an advantage. You can use this advantage to modify data and even extract data when needed fro this tree shaped structure. But what Dom basically does is it parses the whole document and not some parts of it like the SAX. So if you have no need for the entire document then parsing the whole document will be a waste of time and a wasted effort and a waste of memory space for you. When you have large XML documents and have to parse only a small portion of it then it makes sense to use the SAX. While parsing the XML data using DOM there are two major tasks to be fulfilled, one is converting the XML data into DOM data and the other is looking at the data that would be useful for you. XML processing with Java takes place when a parser is specified and if a parser is not specified then the Apache Xerces parser is used.

Parsing in SAX
SAX parsing also includes two major tasks while parsing just like the DOM. One is to create a content handler and the other is to invoke the process and direct it to the content handler. However some instructions have to follow while parsing like telling the system about which parser to use. You have to create an instance for the parser and also then create a content handler which will respond to the parser. The start of document and the end of document should be declared along with start element and end element. The Characters and the white spaces which can be ignored should be clear. Finally the content handler has to be designated to invoke the parser. If the last step is not done then the entire processing function of the parser in the SAX will not happen.

The start element is something which is found in the start tag of the document. In case you forget to mention the element in the tag then the start element will not be present and there for the document itself will not be identified. In case there are errors while parsing this is the first place to check for errors. The end element is typically found in the end tag of the document and it takes values by subtracting two from the indentation and then presents a message. A character is something which is used to print the first word of the tag body and it does' not change the indentation.

XML Remote Calling Procedure


Author : Exforsys Inc. Published on: 19th Jul 2007

XML Remote Calling Procedure

XML RPC or the XML Remote Calling Procedure is a set of compilations and implementations which allow certain programs to run on complex platforms or operating systems and allows them to make Remote Procedure calls on the Internet. The Remote Procedure Calling Protocol uses HTTP as a transport and uses XML for encoding. XML RPC allows complex data structures to be processed and transmitted or returned and it is very simple to be operated on the Internet. The set of XML RPC implementations is spread over various operating systems like C, C++, Java, LISP, PHP just to name a few.

Understanding XML RPC


The XML RPC protocol is like any other standard protocol which the computers use to talk to each other on a network. The basic of Remote Procedure call is that one computer uses this system to run a program on the other computer it is talking to over a network. When further simplified it only means that there are clients and servers exchanging information on the network. What happens in this procedure is that the client sends a request to the serve r and then waits to receive a response for the client and then forwards the answer to the client. Then it waits for the next request to come by. Let us take a typical working example to explain this situation. In a website which gives current temperatures all around the world, if a user asks the temperature for X place, the request is accepted and after several seconds the user is returned with an answer of what the temperature is. The transaction that takes place is simple for the server to understand. It is what the temperature is? And the temperature is amounts to a Remote Procedure Call transaction. FTP downloads or long lived server transactions which have a session Id are not constituted as RPC transactions. RPC transactions are exchange of information in the form of transactions and not commands and instructions. The Network file system protocol or the NFS uses the ONC RPC protocol. It is also called the Sun RPC because the Sun Networks Invented and as well marketed it. But XML RPC is a little different in functionalities from the ONC RPC. The XML RPC uses XML to encode the information and requests. The ONC RPC doesn't have to do that and is not involved with XML in anyway. The advantage here is XML is extremely a friendly environment to work with and is user friendly interface simply because it is plain readable text and not a code. There are a lot of servers that use XML RPC and there are many programming languages that can be used to build XML RPC based servers and clients.

Function Libraries

There are several function libraries that the XML RPC provides. But mainly the libraries are divided into C and C++. You can either use these libraries individually or you can use them together.

C libraries
The libXML RPC The functions of the libxmlrpc can be discussed here in brief. In the libxmlRPC function the header file declares the interface and how to link to it and a lot more information. Generally the library function will either work or fail. But the distinction between these very crucial functionalities is itself little hazy as to decide what amounts to success and what amounts to failure. Because when the library actually stops functioning or doesn't function at all it doesn't change anything or rather nothing happens that can be distinguished from the prior scenario. However the library function does send an analytical report of how it failed. The LibXML RPC Client The Lib XML RPC Client uses a range of global constants as codes. Due to this the program which is running has to call a library function in order to set up the file code. However these global constants are not safe when a thread is running so you have to ensure that you should not call for this library when a thread is running in a program. However there are functions within the Client to interrupt or debug the program in case it has been used with the thread. The main usage of the LibXML RPC is when you have to run an interim part of the program and not the main program itself. The Lib XML RPC server An XML RPC server basically contains a machine or system which will receive the remote procedure calls and send responses to them. It uses two methods to do this and these methods are stored in the methods registry. The drivers for the protocol use this method registry to execute a XML RPC call and send a response also. These methods are called type 1 and type 2. Type 1 is usually the default. But type 2 is used more with the newer scripts because it has more advanced functionalities. Lib XML RPC server Abyss Abyss is nothing but a general HTTP server program that is used as a web server program. It is very similar to the apache program. The XML RPC is implemented over HTTP, with the abyss server with a handler attached the LIB XML Server abyss can execute a XML RPC call and make a connection. You can write your own Abyss request handler which will take the XML document and convert it to an XML RPC call and give the response as an XML.

Lib XML RPC server CGI CGI or a Common Gateway Interface is used in web servers as a standard interface. This protocol is used with the web interface to perform an http request in calling a user program. For example if the HTTP makes a GET request then the server executes the GET request by sending the file with the contents named GET in it to the web server and the web server can configure a CGI program to send responses. We know already that the XML RPC server can be implemented over HTTP, and all that is needed for an XML RPC server is a webs server that is configured to run a CGI program that knows how to execute a XML RPC call.

The C++ Libraries


LibXML RPC c++ The Lib XML RPC C++ follows some general rules as a part of its functionality. Like C++ Namespace, success/failure, memory management, naming conventions and arguments. These general helps the C++ libraries to execute the RPC in a standard way unless and until specified otherwise. It can be used as an index of rules. Lib XML RPC Server ++ The functions are more or less like the Lib XML RPC server itself without any exceptions. Similarly it follows two methods the Type 1 and Type 2 methods and these methods are stored in methods registry. The method registry helps to forma uniform interface, so that all the methods can use this interface to interact with the protocols drivers. Lib XML RPC Abyss ++ The server responds to the RPC calls addressed to a specific URI or the Universal resource identifier path based on the abyss http server path. There are three ways in which you can use an abyss server. They are A TCP port number is supplied to the port and the server listens to that port to receive the remote procedure calls. A TCP socket can be supplied and the server listens to the port to receive calls for that socket from XML RPC clients. You can also supply sockets which are pre connected and the server listens to these sockets for calls from XML RPC Clients. However the simplest thing to do here is configure an Abyss server and let it carry out the rest of the procedure.

Lib XML RPC Server Pstream ++. This is the only library function which is different from the c libraries. The functionality of this library is to send the information in packets and stream them in an order. This program handles every server connection individually, that is after completing streaming one server connection it exits and then restarts. It depends on the Transmission Control Protocol connection for the client as a connection standard. In order to handle to a series of server connections you should configure to accept TCP connections. A Packet stream is far easier than HTTP to handle. HTTP itself is simple and the packet stream is simpler and very easy. It is probably the simplest way to communicate XML RPC messages. A packet stream is a two way communication method which consists of information packets which are traveling in both directions. A packet stream is nothing but a stream of bytes and this stream of bytes can be different in size that is each individual packet may be of a different size. Each of these packets has a unique connection with the Socket stream or the TCP connection. Each XML RPC message amounts to one packet in XML RPC Packet Stream. And all these individual packet streams are connected to each other.

XML Security
Page 1 of 2
Author : Exforsys Inc. Published on: 21st Jul 2007

XML Security
Documents can be secured using XML now. When data is released to the web it becomes free for all and is available everywhere and it is literally omnipresent. How do you secure and safeguard something which is so widely spread. Security issues for XML documents has now reached climax because XML documents can be secured using XML security. XML secures the documents in two ways; one is the ML signature and the other XML encryption.

XML Encryption
In the World Wide Web security is taken care of by secure socket layer (SSL) and Transport Level Security (TLS). This security software's makes sure that end to end applications are safe and secure, for example email communications. But these can cater to only the end to end segment. XML Encryption takes care of the gaps in the areas where the secure socket layer or Transport level security cannot fulfill. IXML security is capable of providing end to end security and selective security.

The XML syntax


How XML digital signatures created are and what do they cater to? The applications of XML signatures can be extended to digitally encrypted documents and can be applied to any varied digital content including XML documents. The XML schema usually decided the XML signature application that will be used. The XML signature application can be enveloped within the document; it can be applied to documents from more than one resource. The most important job of an XML signature application is to specify key for the encrypted documents. It is not the applications job to reference how the keys are associated with different persons to whom the communication is digitally encrypted or carry information what the data contains. Its job is to just provide the key for accessibility. The specifications provided in a XML security application cannot take care of all security concerns and while the specifications cannot address them, it becomes essential to use additional keys, algorithms and rendering needs. XML uses the capital letters to carry out instructions usually in the schema. The schema is not concerned with grammar and its functionality is more to bring out the desired results by carrying out the essential commands.

An overview of Signatures
XML signatures may be applied to digital content or data objects arbitrarily. Digital data objects are disintegrated and then placed with a cryptographic signature in the document. The Signature Element represents the digital data by using a structural format for representing the said data. The validation process involves two steps. One is validation of the signature and the other is the validation of every single reference in the document. The algorithms that calculate the value of each signature is included in the signature itself. The key info usually has the info required to validate the document. The processing contains of three steps, core generation, core validation and core signature syntax. Core generation is further divided into two levels, reference generation and signature generation. In reference generation for every data object that has been signed, transforms are applied according the data object determined by the application. The value of the signature is calculated for the data object and then the signature element is constructed which will include the objects and the signed information.

In Signature generation the process that is followed is using the signature method, canonicaliztion method and references, a signed info element is created. Using the algorithms in the signature info the value of the signed object is calculated and then the signature element is constructed which will include the objects and the signature, key info and the signature value. Core validation is further divided into two steps. These are the signature validation and reference validation. Some times in an application there may be some valid signatures but the application fails to validate these signatures. It may be caused due to the failure in implementation of a few parts in the specification or unwillingness to identify specific algorithms or even universal resource identifiers. In the reference validation process the signed information element is canonicalized using the canonicalization method in the signed info. Then the data object is obtained and digested. The resulting data is digested or disintegrated using the digest method obtained from the reference specification and then the digest value is generated and compared to the digest value in the signed information reference. If there is any mismatch or inequality in the values the validation will fail and will be unsuccessful. In the signature validation process the keying information is obtained either from an external source or in the key info and the canonical form of the signature info is obtained using the canonicalization method and the obtained result is used to validate the signature value and the signature info element. Core signature syntax provides information on the features the core signature. These features are important and a must for the function of the program or its implementation.

XML SQL Server


Author : Exforsys Inc. Published on: 23rd Jul 2007

XML SQL Server


The Extensible Markup Language was introduced in the SQL server because of the clause to SELECT. Now XML has been well integrated into the Microsoft SQL server in the Relational Database Management System or the RDBMS which can help create futuristic web programs and databases.

Why introduce XML into a SQL server?


A business to business portal or B2B, business to consumer portal or B2C, or Intrabusiness networks need to exchange a lot of information and this exchange of information can be easily enabled with the help of XML. You could integrate ERP (Enterprise Resource

Planning) and CRM (Customer Relationship Management) functionalities by using XML with the RDBMS. Some of the features of XML in SQL server are OpenXML, HTTP, OLE DB, ADO access, XML modes, XML views, and SELECT statement options. Data can be accessed in three ways using HTTP and they are SQL Statements entered in the URL (Universal Resource Locator), Templates, and through HTML post event integration. Data is retrieved using the XML modes. There are three types of XML modes and they are RAW, Auto and Explicit. The RAW uses the methodology of taking each row and converting the result into an XML document. The AUTO uses the method of returning the queries in a XML tree format. The Explicit mode simple defines the shape of the XML tree and specifies the way the queries can be written. XML Data schema provides ML view on the database using the annotations derived from the SQL server. These annotations also appear within the XML schema to identify a two way mapping system for the XML data from tables to columns and then columns to tables back. The name of the annotations remain the same fro the XML data schema, to the database and the column name. These annotations are also used to define a hierarchical relationship between data.

XPath queries
XPath works with the XML view technology so that the data that is being retrieved can be in the form of an XML document.

OPENXML
OPENXML enables a way to access XML Data and present it in the relational database. It creates an environment for the database to interact with the XML data within the SQL by transferring the data to tables. Statements such as SELECT, UPDATE, DELETE and INSERT can be used along with OPENXML.

Ole DB AND ADO Access


The SQL Ole DB is an extension for the Xpath and XML .A new interface has been set up to pass the commands to OLE DB to be processed called the Icommandstream. The OLE DB has also been extended so that it can process queries so that it can return results in XML by using the XML views in XML schemas. XML features mainly serve users for web and database development. While developing XML documents on a SQL platform it is not a must that the users have to learn or well versed in XML. One of the important factors to be considered is the compatibility or the

backward-compatibility for XML in SQL server. Since the integration of XML data type the generation of data is directly in XML format. Microsoft SQL server and the SQL XML are known to create excellent and efficient XML data management techniques. The XML centric approach has the capability of processing loads of XML data using the annotations that were defined using the XML view and XSD. The data can be divided into two divisions, data modeling and data usage.

Data Modeling:
Whether a user has to choose XML View technology or XML storage depends on lot of varied factors. For a highly structured for of data that has familiar schema to encode the relational data model will work the best for data storage purposes. When the data is unstructured or semi structured or flexible, it can be modeled to fit the needs. However SQL server provides a lot of tools to model data. When there is a need for a model which is platform independent and easily transferable then XML is the bets choice in such circumstances. When the data is stored in XML the engine checks for the authenticity of the data and if it is well structured to support fine grained data for queries and updating.

Why should data be stored in XML?


If there is a need for backup, retrieving, recovery or replication, any of the administrative functions for the database server to manage data in XML then data should be stored in XML. If there is a lot of fine grained data in your documents and you need them to be extracted into a XML document while creating a whole new document then data has to be stored in XML. When you have relational data on one hand and SQL data on the other then you need XML data to create a conversation in terms of data. If you want your data to be highly organized then it has to store in XML format. When you need accessibility to SOAP, OLE DB, ADO.NET, for the XML data then data has to be stored in XML format. If your data doesn't have to be any of these and is of the non XML type then it is not needed to store data in XML format.

XML storage Options in SQL Server.


When storing within XML document, the data is stored within the XML document in a hierarchical form. Actually the info set for the XML data is stored. The mapping is done through AXSD, using annotated schema where the data in XML is broken down into tables preserving the data in a hierarchical format where as the schema of the remaining document is left unchanged. For large object storage an exact copy of the data is stored for either back up or recovery. There are two technologies within XML that can be used. One is the XML and another XML view. A lot of factors decide what technology has to be used for data storage. The first among

these is the storage option, which depends on whether the data is large or small. Then Query capabilities are the second factor which will determine the storage option. It all depends on the nature of query and the type of data which will decide. Indexing XML data also plays a pivotal role as the process speed improves when the data is indexed in XML. The data modification capabilities also are essential for certain types of data you ate dealing with. The language support for data modification should be feasible at the same time. Last and not the least is the schema support. The schema may be able to map the XML document but the Xml document being a schema document also matters. These choices function individually and it all depends on the data to decide which will suit the best. The XML view technology can be used along with the Xpath technology to view SQL queries in the table. Updates can also be changed in the tables itself. XML view technology is useful when there is XML programming using XML view. There is a schema for an XML document and when the data doesn't have to be organized. If you need to query the data using the XML Path and you need bulk data to be redistributed into tables for immediate access of the XML view?

How is XML data used for data modeling?


SQL server employs the same techniques as the XML to lock the XML and Non XML data. The XML data is stored in the XML documents. To achieve a design of good quality there should correlation and correspondence between the data model characteristics and the locking characteristics. Another point to consider is the data type itself whether it is typed or untyped or it can also be constrained. Whichever type of data it is one thing is ensured that the data is well structured. Unstructured data is rejected and not stored. The data has to be well formed to enter the database. For different circumstances different types of data can be used, like for example when you do not have a schema then untyped data can be used or when there is no need for the server to validate the data then in this process also the untyped data can be used. But if you do have a schema and there is a need for the server to authenticate the data then it is a good idea to use the typed data. While these data types do have limitations like certain kinds of data cannot be expressed in any of these types for example business rules. In these cases the constrained data is used in combination with typed data and untyped data. So different circumstances and different scenarios decide the storage of data. If all these factors collectively are well coordinated with each other then the right decision can be made for a XML SQL server to function at its utmost efficiency level in terms of performance.

Vous aimerez peut-être aussi