Vous êtes sur la page 1sur 18

WEB DATABASE INTEGRATION

DESIGNING AND IMPLEMENTING WEB SITES TO INTERFACE WITH HETEROGENEOUS DATABASE ENVIRONMENTS

Submitted by: Neelabh Singh (B.Tech Final Year CSE)

Madan Mohan Malviya Engineering College Gorakhpur-273010

CONTENTS 1. Abstract 2. Introduction Needs Web Technology World Wide Web Relational Database 3. Web Database Development Maintaining State Web Browser Web Server Software Operating System 4. Database Interface Embedded SQL ODBC OLEDB JDBC 5. Integration of RDBMS and WWW Double Client/Server Architecture Connected/Disconnected Environment 6. Future XML 7. Conclusion

ABSTRACT
More and more accessible databases are available in the Web. In order to provide people a unified access to these Web databases and achieve information from them automatically, a comprehensive solution for Web database integration is proposed in this paper. Web technology has become the common user interface of choice for many information dissemination systems. Whereas, relational database management systems (RDBMS) have been the cornerstone for information warehousing for years. The integration of the two technologies has made rapid advances over the last few years. This rapid explosion has led to new challenges for information technology managers and developers. There are several competing technologies available which often do not address the issues of heterogeneous environments and web-based application development. This document addresses the challenges of designing and implementing database-integrated Web sites. Furthermore, it focuses on database-web integration difficulties in heterogeneous database environments. The technology evolved where other media such as graphics, audio, and video files can be disseminated via the web. Since there is a wealth of valuable information in databases, the integration of web sites with database technology is a natural progression of web technology. The web provides a common user interface whereas the database provides the logical structure of storing and manipulating data. Besides the limitations of the web, there are many issues regarding database access via the web. First, the developer must choose a database interfacing technique(s). There are many proprietary solutions such as Cold Fusion, Microsofts ADO via Active Server Pages, etc. In addition, each major database vendor has there own web database interface solution. Oracle has its Web Developer Suite whereas Sybase has its web.sql product .This document serves as a guideline and reference for information managers and developers for addressing these issues in their respective environments.

INTRODUCTION
World Wide Web (WWW or Web) technology has grown at a phenomenal pace since its inception in 1991. The Web provides a platform independent, common user interface to information all over the world at an economical rate. Every major software vendor in the world has included some sort of Internet/Web based solution for their products ranging from support to direct interfaces to web technology. Over the last 5 years, the Web has evolved from a file based retrieval system to an application oriented medium where users can perform purchases, query databases, or even customize their interface to various sites. This evolution has changed web developers and web masters to keep the content on web sites up to date, collect meaningful statistics on the use of the site, and empower the content owners with the maintenance of the web content. What had to be done? We need repository for submitted articles which can be accessed over Internet and intranet (because users of this will come from all over the world using Internet and we want to update database from our local area network). Database should be assessable from web browser over Internet and desktop database applications (like Microsoft Access) from intranet for easy updates. Database should provide authors all over the Internet with ability to change data about theirs articles in database after authorization, and contain all our information needed for conference. We had limited budget which prevented us from evaluating commercial solutions. Our project ought to be used for at least five years, so we shouldnt choose some property solution which will cease to exist in that time-frame. With all those points in mind, we decided to present a paper that is based on relational database with World Wide Web front end based on open-source technologies. WEB TECHNOLOGY The Web has become an acceptable, cost-effective information dissemination and collection tool for many businesses and organizations. Many of these entities use databases to provide web content and to collect information from their users and customers. Database driven web sites allows the web developer to provide the end user a means to access data in a logical manner rather than a file based manner. Data can be stored in a central location. Update the data in the database, the web site is then automatically updated. This enhanced functionality of the web site does not come without a price. The web site is much more complex from administration, development, and design points of view. In order to fully understand why a database driven web sites is such powerful information tool and a complex development environment, we must understand the derivation of web technology. The web is one of many Internet services. There are other Internet services such as email, file transfer protocol (FTP), and many more...Other than email, the web is the most frequently used Internet service. The files used on the web are formatted in a standard manner called hypertext markup language (HTML). Since the web is an Internet service, it must run on the TCP/IP protocol stack. The web uses HTTP

(hypertext transfer protocol) to run on top of TCP/IP. Simply, HTTP is the protocol that transports the HTML files from one computer to the next. The end user must have a software application called a web browser to view the HTML files. The web browser interprets the HTML code and presents the information in a viewable format for the end user. Each software vendors web browser interprets HTML code differently. This poses problems building consistent looks to data among different systems. The web database developer must confer with the web developer on the presentation of database content.

Web Dataflow Diagram End User Web Server There is a direct correlation between the growths of the Internet with Internet explosion of the PC market. In 1982, the personal computer (PC) was created. The PC provided the end user with control of his/her computing environment. The end user was empowered to process his data or information locally on his/her machine. In 1990, Windows 3.0 provided a graphical user interface (GUI) to the PC. The Windows GUI made the PC even more users friendly to use. The web is a natural extension on the empowerment of the end user. The web empowers the end user to gain access to information locally to his/her PC. Gaining access to files was not enough. Web developers needed access to applications and programs to make their web sites and applications more powerful, thus, the advent of the common gateway interface (CGI). CGI allows the web server to connect to another program. The primary purpose of early web servers was to receive commands from the web browser (client) and serve the client HTML files. Web servers were not created to process or manipulate data. CGI gave web developers access to programming languages and applications. Web developers could write an application in languages such as C, C+ +, Perl, TCL, Python, and many others to perform functions that the web server could not do.

Web Model with CGI/API Interface Once again, access to programs was not enough for the web. Many organizations have a wealth of information in databases. The web is a perfect media to provide access to this information. So, once again necessity, the mother of invention, provided web access to databases. Up until this point in the web revolution, there was central standardization of web technology via the WWW Consortium. This is good in the fact that you can select a

database interface that suits your needs for a single system or application. However, it may be difficult to find a single solution that will suit your needs if you have many different databases and web server operating systems. There are several reasons why there is a lack of standardization among web-database interfaces: CGI allows any developer to write there own applications to interface with the web server. Open Source web server software provides the source code where developers can optimize database connectivity to the web server software. In order to gain competitive advantage and to better provide more capabilities for their customers, each major database vendor has developed their own web interface to their database product. i) Oracle - Web Server ii) Sybase - web.SQL The demand for web based applications. As a result of these factors, there are many different web-database interface methods and products available to the web developer. It is a challenge to the web developer to choose the appropriate method or product. What is World Wide Web? The World Wide Web (known as "WWW', "Web" or "W3") is the universe of network accessible information, the embodiment of human knowledge. It is basically composed of two main protocols. One of them is called Hyper Text Transfer Protocol (HTTP) and the other is Hyper Text Markup Language (HTML). HTML is language that describes appearance of text on screen (which is in fact displayed and positioned on screen by your web browser) and links or references in form of hypertext. HTTP is protocol based on TCP/IP, used to transfer HTML pages over network, from HTTP server to clients web browser, which is in this architecture client for accessing HTTP server. Why use relational databases? The database is a data structure, usually rather big and stored in secondary memory, which is specialized for easy processing of large amount of different queries, and other operations among large scale of different data. There are many different database management systems (DBMS), which are used as interface between database user and computer, so user can look at his database from logical point of view, without any need to know physical way DBMS use for data storage. From that point of view DBMS can be seen as back-end CASE tool for static part of information system. Advantages that made figured relational model almost the only one used for database management today are formal foundations, complete independence of logical and physical level of database, easy way of connecting database objects at logical level etc. Relational model is consists of 2 classes of objects relations, attributes. We can say that the attribute is atom of relational model. Attributes consist of attribute name and domain. Domains are usually standard data types known from procedural programming. Relational scheme is finite sets of attributes. Relational scheme a pattern for building relations. Intuitively said, records are n-tuples of values, where ith value is chosen from domain of ith attribute. In cases when attribute domains are known form semantic context, we usually define attributes only with their names. Now we can say that relation is finite set of records. Relations can be presented as 2-dimensional tables. To have access to every record in database, in each relation we highlight one or more attributes as a primary key. Attributes in primary key have to be unique for

each record in the relation. We can say that database is set of relations. We build databases as models of real world. But, in reality objects are not independent. There are connections between them. The database, as the model of real world, has to represent those connections too. Because of that we introduce a concept of relationships. Relationship is a connection between two or more records of relations. It doesnt mean that there are no relationships between records of the same relation. Relationships in relational model are not represented. Relationships are created as they are used for queries RDBMS receive. Because of that relational model is flexible for very large scale of different queries that connect more than one relation from database. To achieve relationships between records, we use a concept of foreign keys. Foreign key is set of attributes in some relation that is primary key of some other record of some relation in database.

WEB-DATABASE DEVELOPMENT
Web development is very simple, and yet complex at the same time. The core technology of the web is very simple due to the fact it was designed to run on any platform so long it could communicate HTML over HTTP connections. As a result, web-based client/server applications are stateless. The complexity of the development is the fact that the developer must contend with many different environments and must use crude methods to provide state to a web based application. The following sections include the basics of web-database development. MAINTAINING STATE One of the biggest problems when building a web based application is maintaining state. The web is a stateless, client-server application. Once the web server satisfies the web clients (the browsers) request, the web server has no idea the status of the client. The web server has no idea when the client has moved to another web server. There are two techniques that the developer can use in basic HTML to build state into the connection between the browser and the web server. The server based application can set a cookie on the clients workstation. A cookie is a small text file on the clients machine which the web server can read. Many users disable this feature on their browsers feeling that the web administrators and web masters are using this information to track or monitor the users actions through the site. The second method is to pass variables either through the URL string or through the server environment variables. The scripting program used must be able to access these variables to perform pre-defined logical operations. DEVELOPMENT ENVIRONMENT The web is a very unique development environment due to the fact that there are so many environments in contention. This is especially true for 1. Web browsers 2. Web Server software 3. Server Operating System 4. Scripting or Application language 5. Database Interface 6. Database Application WEB BROWSER As previously discussed, the web browser is responsible for displaying HTML code in a viewable format for the end user. The primary issue with web browsers is that browsers from different manufacturers display HTML content differently. For example, one browser may display an empty cell within a table as the same color as the table background color. Whereas, another browser may display that same empty cell as the default page background color. The web database developer should be concerned from a presentation point of view with the type of browser the user is viewing the data. A HTML specialist should be consulted on the design of tables used to display content via web browsers. The type and version of a browser can be collected and logged by most web servers. The developer can check these logs to ensure that development of the system will satisfy most users.

The developer must also be mindful of the video resolution that he end user will be viewing the data. Today, the developer does not have to contend with as many different browsers as in the past. WEB SERVER SOFTWARE Web servers initially were developed to with the sole purpose of delivering HTML content to the web browser. However, the role of the web server has grown more complex as the Web matures. Today, web servers process data, interface with other applications, provide data security, and many other functions. There are well over 30 web server software packages available running on many different operating systems Microsoft Internet Information Server (IIS) IIS is a commercial web server software application developed by Microsoft Corporation. It is distributed at no extra cost with Windows NT. Unlike, Apache, the source code is not freely available. There is debate on the scalability of IIS on the Windows NT operating system. The primary advantage of IIS is that it is distributed with Windows NT and that it has a very simple administration interface via the console or the web. Also, Microsoft has gone through great lengths to develop tools such as InterDev and FrontPage to allow developer s to focus on content. There are many wizards and pre-defined templates available using these tool sets with IIS. Netscape Enterprise Server Netscapes Enterprise Server is a commercial software application which runs on both UNIX and NT platforms. It has is own API called NS API (Netscapes Application Programming Interface). Netscape recently added an extension support for Java servlets. SERVER OPERATING SYSTEM The foundation for web database development begins with the server operating system. The web server software must run on the server operating system. The server operating system must be able to handle the number of requests from the web browser and to the database application. The files system of the operating system must be fast, efficient and robust to handle web based applications.

SCRIPTING OR WEB APPLICATION LANGUAGE


For most web applications, displaying HTML files and accessing environment variables is not enough. Thus, an interface to other applications was created for web servers to connect to other applications. This interface is commonly known as the Common Gateway Interface (CGI). CGI is the mechanism for communicating between a gateway and the web server. A gateway could be a script or a program like PERL. CGI is still common place for many web applications. However, there is a growing trend for the web server to process web pages via a scripting or programming language. The advantage is that another process does not need to be opened and closed to handle that specific task. For example, you may have access to a database through a CGI program like C. Every time a web user accesses that web page which connects to C via the CGI, a

separate process is opened, executed, and closed. Imagine several hundred uses accessing that file simultaneously. Will your system handled this load??? If the server handles the processing, a separate process in addition to the web server does not need to be created. This can save system resources. Microsofts IIS running ActiveX Server Pages (ASP) and ASP .NET has adopted this ASP are text based files which combine HTML tags and Active Server scripts. In other words, ASPs allows the developer to embed Active Server scripts into HTML documents. The Web Server reads the delimiters which separate the HTML code from the Active Server script. ASP uses <% and %> as delimiters. The Web server knows to process the content inside the delimiters as an Active Server script. There are many scripting languages supported by ASP such as VBScript, JScript, or even PERLScript, VBScript is the default language that is interpreted within the delimiters.

Example of server-side scripting using VBScript in Active Server Pages.

DATABASE INTERFACE
The primary function of the web server to send appropriate HTML code to the web browser. Todays trend is to serve content to the web via a database. In order to make this happen, the web server must communicate with the database. The web server must make requests to the database, interpret the databases response, and pass on the appropriate data to the web browser. In order for the web server to communicate with a database, it must communicate through an API (Application Programming Interface). There are many different types of database access APIs available for the developer ranging from proprietary to open standard APIs. A web database developer has many options to select the API that best meets the requirements of the project. However, the developer must be very careful in the selection of the API if he/she must support a heterogeneous environment. One API might not support all database or web servers in the developers environment. Embedded SQL Earlier there was no common function API and no standard 4GL. Embedded SQL uses a language specific Pre-compiler. SQL commands are embedded in a host programming language, such as C or COBOL. The Pre-compiler translates the embedded commands into host language statements that use the native API of the database. The problem with using Embedded SQL is that there must be a compiled version of the database interface for each database and operating system supported. ODBC When building a web site which must connect to many different databases, the first database connectivity standard normally considered is ODBC. ODBC is a logical choice because ODBC is a standardized API (Application Programming Interface). It is a set of function calls based on the SQL Access Group (SAG) function set for utilizing a SQL database system (back-end system). The SAG set implements the basic functionality of Dynamic SQL. Embedded SQL commands can be translated to call ODBC. Finally, there are ODBC drivers for every major database application. OLE DB OLE DB could be viewed as an object layer placed on top of ODBC, but Microsoft has provided direct OLE DB drivers for their database products and to de-emphasize and perhaps discontinue ODBC drivers for their products. OLE DB is not open or portable except between Microsoft OSs, which will become only a single OS - NT, in the next few years. Because of Microsoft's total control of the specification and arbitrary complexities in the facility, OLE DB is not supported by other Operating Systems - OS/2, MAC OS and various flavors of UNIX. ODBC, and Embedded SQL to a lesser degree, remain as the only open and portable interfaces for SQL accessible databases Java and JDBC JDBC is a SQL-level API that allows you to embed SQL statements as arguments to methods in JDBC interfaces. To allow you to do this in a database-independent fashion, With JDBC, you can run the same code no matter what database is present. There is an architectural conflict between Java and relational databases.

Java is object oriented whereas relational databases are not object-oriented. The use of Java and JDBC has two distinct advantages for heterogeneous web application development. It is database independent and facilitates distributed computing. A Java database application does not care what database engine is used. The administrator does not have to install the software on each users workstation. This model is very beneficial when it comes time to update the application.

Integration of RDBMS and WWW


Now that we introduced RDBMS and World Wide Web, we should answer to main question: Why should we integrate databases and WWW? There are many different interfaces for database human interaction, so why should we use WWW? There are many answers, but lets just outline few most important ones for us: Databases are best used for storing all kind of information. HTML enables various forms of data representation (which are suitable for displaying data from relational databases), which is platform and location independent. Modern trends in database design include client/server architecture that can be implemented using proposed model quite easily. If we review those points relating to our initial goals, we can see that they are almost perfect fit. So, choosing RDBMS and WWW was logical choice. There are two ways for connecting World Wide Web and databases. One is to use CGI that starts external programs (that process is computing intensive) and the other is to use some sort of extension of HTML language.

CGI (Common Gateway Interface) is an API that defines how applications talk to web serves (and exchange data with them). HTTP defines way in which web clients send data to web servers, which in turn, transfers that data to CGI programs which process them. CGI programs are, in our case, programs that are used to access databases. They can create output, which is in most cases HTML pages, which are then transferred to client machines using HTTP protocol by HTTP servers.

The other way to access databases from WWW is to use some kind of HTML language extension. These extensions define special tags, which are not transferred to client web browsers, but executed and replaced by the result of that execution (which is in most cases again HTML code). However, second approach has advantage in easier

maintenance (code and HTML arent divided) and speed of execution (as no external programs are forked. All processing is done inside web server). Double client/server architecture To add to general confusion, client/server architecture in this case is twofold. First, we have client/server architecture in WWW. In this case, clients are programs (browsers) which display data received from servers (HTTP servers) in HTML format on user screens. On the other hand, from database view, clients are CGI scripts which access database directly, but also JavaScript programs which are run "inside" clients browsers whose main purpose is to check validity of input data. Server, from database view, is, of course, relational database management system.

All of components mentioned before (client browser, HTTP server, CGI program and RDBMS) can be distributed throughout the network to provide load balancing or just to accommodate flood of requests that is possible. So, if we look at client/server architecture described above as a distributed system, there is one more important thing to note: system architecture must also be open. Here are reasons why: Benefits of interoperability and portability extend to all components in the architecture. The architecture can be specialized or can evolve by changing the implementation of individual components. The architecture can be extended by introducing new components at a later date. Connected and Disconnected architecture There are two techniques to connect and receive the data from various database servers. Connected Environment Disconnected Environment In connected environment web server establishes individual connections on each client (Web Browser) request and remains connected throughout the session to the client. This is the environment which was used earlier in connecting from a web server. But it limits the number of clients which can use the resources of a database server, because a database server cannot get connected to more than a limit of clients.

So, the new environment was introduced in ADO .NET by MICROSOFT called disconnected environment. In this environment a web server connects to a database server on request of a client and imports tables which are to be manipulated by clients and disconnects the connection. User manipulates this imported table and at last when the user finishes the manipulation of the database, again a connection is established with the server and the necessary updations are made in the original database. The frequency of updation depends upon the importance of the data. The disconnected architecture is implemented as follows: Dim con as new SQLClient.SQLConnection Con.Connectionstring=data source=<ip address of server>; initial catalog= <name of database>; user id =<name of user>; password=<password>; provider=<provider>; connection timeout=<15> Dim da as new SQLClient.SQLDataadapter(<command string>,con) Dim ds as new DataSet Da.fill(ds,<name of table>)

FUTURE
The web is evolving into the largest information repository in the world. There will be a continued strong demand for tools, utilities, and applications where the user can access this information with greater speed and efficiency. Web application development will continue to mature to satisfy the users demand. The development time on the web is much shorter than other development environments. The web developer will continue to look for tools to provide more functionality and yet flexible to use in many different environments. XML One of the biggest limitations of HTML has been the presentation and organization of its content. Extensible Markup Language (XML) allows developers to easily describe and deliver rich, structured data from any application in a standard, consistent manner. XML does not replace HTML; rather, it is a complementary format. XML is becoming the vehicle for structured data on the Web, fully complementing HTML, which is used to present the data. By breaking structured data away from presentation, Web developers can begin to build the next generation of Web applications. Learning to author XML and manipulate XML data sources will enable you as an HTML author to supply your Web pages with content that is more intelligent and more dynamic. Marking up data using XML will also enable you to create data sources that can be accessed in a number of different ways for a number of different purposes, making interoperability between applications and your Web site possible. XML also holds the promise of becoming a standardized mechanism for the exchange of data as well as documents. For example, XML may become a way for databases from vendors to exchange information across the Internet.

CONCLUSION
The Internet will continue to evolve into mainstream of the world. As a result, the amount of content on the Web will continue to grow. Database technology is the enabling technology where logic can be applied to the input and retrieval of information. More web sites will connect to databases to take advantage of the logical operations of a database. Large organizations with heterogeneous environments will implement webdatabase solutions which can be applied throughout their environment. By choosing HTTP as main transport protocol, HTML as a language for data representation to user and SQL as a query language for database, we have fulfilled requirement for open systems. There is a myriad of database interface solutions available to the developer today. However, there are not many which can be effectively applied to a heterogeneous environment. The foremost is using ODBC to interface with your databases. The developer must be careful with ODBC because not all ODBC drivers and resources are built the same. There are incongruent aspects of various ODBC products in the market today. JDBC is another option. You must use Java on the server side or your scripting language must connect to JDBC resources. The future seems very bright for database access in heterogeneous environments using Java on the server side. Java and JDBC on the server side will free the developer from worrying about what operating system is used and what database is used. The developer is free to focus on the application itself.

Vous aimerez peut-être aussi