Vous êtes sur la page 1sur 61

UNIT V – CLIENT SERVER AND INTERNET

INTRODUCTION

 A Web form is an HTML page with one or more data entry fields an a mandatory
“submit” button. You click on the submit button to send the form’s data contents
to a web server. This causes the browser to collect all the inputs from the form,
stuff them inside an HTTP message, and then invoke either an HTTP GET or
POST method on the server side.
 Hidden fields are basically invisible; they contain values that are not displayed
within a form. They are used to store information a user enters and resubmit that
information in subsequent forms without having the user reenter it or even be
aware that the information is being passed around.
 A cookie (also tracking cookie, browser cookie, and HTTP cookie) is a small
piece of text stored on a user's computer by a web browser.

 SQL (Structured Query Language) is a database computer language designed


for managing data in relational database management systems (RDBMS). Its
scope includes data query and update, schema creation and modification, and data
access control.

 Process-per client architecture provides maximum bullet proofing by giving


each database client its own process address space.

 Multithreaded architectures provides the best performance by running all the


user connections, application the database in the same address space. It provides
its own internal scheduler and does not rely on the local OS tasking and address
protection schemes.

 A stored procedure is a set of SQL commands that has been compiled and stored
on the database server.
 Triggers are special user-defined actions-usually in the form of stored
procedures-that are automatically invoked by the server based on data-related
events.

 A rule is a special type of trigger that is used to perform simple checks on data.

 A Decision Support System (DSS) is an interactive computer-based system or


subsystem intended to help decision makers use communications technologies,
data, documents, knowledge and/or models to identify and solve problems,
complete decision process tasks, and make decisions.

 An Executive Information System (EIS) is a set of management tools


supporting the information and decision-making needs of management by
combining information available within the organisation with external
information in an analytical framework.

 A data warehouse is a subject-oriented, integrated, time-variant and non-volatile


collection of data in support of management's decision making process.

 A data mart is a repository of data gathered from operational data and other
sources that is designed to serve a particular community of knowledge workers.

 Data mining, the extraction of hidden predictive information from large


databases, is a powerful new technology with great potential to help companies
focus on the most important information in their data warehouses.

 Groupware is software that supports the creation, flow and tracking of non-
structured information in direct support of collaborative group activity. The
Components are Multimedia document management, Workflow, Email, Group
Conferencing, Group Scheduling.

CONTENTS

Web Client/Server - 3-Tier Client/Server, Web Style


 A Web form is an HTML page with one or more data entry fields an a mandatory
“submit” button. You click on the submit button to send the form’s data contents
to a web server. This causes the browser to collect all the inputs from the form,
stuff them inside an HTTP message, and then invoke either an HTTP GET or
POST method on the server side.
 On the receiving end, the typical web server does not know what to do with a
form. So server simply turns around and invokes the program or resource named
in the URL and tells it to take care of the request. The server passes the method
request and its parameters to the back end program using a protocol called CGI
(Common gateway Protocol).

 The backend program executes the request and returns the results in HTML
formats to the web server using CGI protocol. The web server treats the results
like a normal document that it returns to the client. So the web server acts as a
conduit between the web client and a back end program that does the actual
work.

 The figure shows the elements of this new 3-tier client/server architecture web
style. The first tier is a web browser that supports interactive forms: the second
tier is a vanilla HTTP server augmented with CGI programs: and the third tier
consists of traditional back end servers.

 CGI technology makes it possible for Internet clients to update databases on back
end servers. In fact, updates and inserts are at the heart of online electronic
commerce.
 Software vendors are using the technology of SSL, HTTP, IPSec and Internet
Firewalls to connect web browsers to virtually every form of client/server
system-including SQL databases, TP Monitors, Groupware Servers, ERP
systems, MOM queues, e-mail backbones and ORBs.
 All these systems provide gateways that accept CGI requests and they provide
transformers to dynamically map their data into HTML, so that it can be
displayed within web browsers

CGI: The Server side of the Web

The form’s data gets passed using an end-to-end client/server protocol that includes both
HTTP and CGI. The best way to explain the dynamics of the protocol is to walk you
through a POST method invocation.

A CGI Scenario

The figure shows how the client and server programs play together to process a form;s
request. The steps are:
1. User Clicks on the form’s “submit” button

This causes the web browser to collect the data within the form, and then assemble it into
one long string of name/value pairs each separated by an &. The browser translates
spaces within the data into (+) symbols.

2. The Web Browser invokes a POST HTTP method

This is an ordinary HTTP request that specifies a POST method, the URL of the target
program in the directory and the typical HTTP headers. The message body-HTTP calls it
the entity-contains the form’s data.

3. The HTTP server receives the method invocation via a socket


connection

The server parses the message and discovers that it’s a POST for the program. So it starts
a CGI interaction.

4. The HTTP server sets up the environment variables


The CGI protocol uses environment variables as a shared bulletin board for
communicating information between the HTTP server and the CGI program. The server
typically provides the following environmental information: server_name, request
method, path_info, script_name, content_type and content_length.

5. The HTTP server starts a CGI program

The HTTP server executes an instance of the CGI program specified in the URL. It’s
typically in the program.

6. The CGI program reads the environment variables

In this case, the program discovers by reading the environment variables that it is
responding to a POST.

7. The CGI program reads the environment body via the standard input
pipe (stdin)

The message body contains the string of name=value items separated by &. The
content_length environment variable tells the program how mush data is in the string.
The CGI program parses the string contents to retrieve the form data. It uses the
content_length environment variables to determine how many characters to read in from
the standard input pipe.

8. The CGI program does some work

Typically, a CGI program interacts with some back end resource-like a DBMS or
transaction program-to service the form’s request. The CGI program must then format the
results in HTML or some other acceptable MIME type. This information goes into the
HTTP response entity, which really is the body of the message. The program can also
choose to provide all the information that goes into the HTTP response headers. The
HTTP server will then end the reply “as is” to the client. Because it removes extra
overhead of having the HTTP server parse the output to create the response headers.
Programs whose names begin with “npg-“indicate that they do not require HTTP server
assistance: CGI calls them nonparsed header programs (nph).

9. The CGI program returns the results via the standard output pipe
(stdout)

The program pipes back the results to the HTTP server via its standard output. The HTTP
server receives the results on its standard input. This concludes the CGI interaction.

10. The HTTP server returns the results to the Web Browser

The HTTP server can either append some response headers to the information it receives
from the CGI program, or it sends it “as is” if it’s an nph program.

CGI and state

The server forgets everything after it hands over a reply to the client. The Internet is
always some “kludge” that you can use to work around problems.

(i) Hidden Fields

 A Hidden field is an ordinary input filed inside a form that is marked with the
attribute HIDDEN.
 Hidden fields are basically invisible; they contain values that are not displayed
within a form. They are used to store information a user enters and resubmit that
information in subsequent forms without having the user reenter it or even be
aware that the information is being passed around.
 The hidden fields act as variables that maintain state between form submissions. It
gets passed through the CGI program.
 The figure shows an electronic transaction requires multiple form submission,
which leads to an electronic payment. The first set of forms pick the merchandise
that you place on your electronic shopping cart. The next form requests a delivery
address and time. The last form presents you with the bill and request some form
of payment.
 The CGI program processes a form and then presents you with the next form; this
continues until you make your final payment or abort transaction.
 The CGI program uses invisible fields to store information from the previous
form in the next form.
 When you submit the form, all these hidden fields are passed right back to the
CGI program along with any new data.
 The CGI program stores the state of the transaction in the forms it sends back to
the client instead of storing in its own memory.

(ii) Cookies

A cookie (also tracking cookie, browser cookie, and HTTP cookie) is a small piece of
text stored on a user's computer by a web browser. A cookie consists of one or more
name-value pairs containing bits of information such as user preferences, shopping cart
contents, the identifier for a server-based session, or other data used by websites.

It is sent as an HTTP header by a web server to a web browser and then sent back
unchanged by the browser each time it accesses that server. A cookie can be used for
authenticating, session tracking (state maintenance), and remembering specific
information about users, such as site preferences or the contents of their electronic
shopping carts. The term "cookie" is derived from "magic cookie", a well-known concept
in UNIX computing which inspired both the idea and the name of browser cookies. Some
alternatives to cookies exist; each has its own uses, advantages, and drawbacks.

Most modern browsers allow users to decide whether to accept cookies, and the time
frame to keep them, but rejecting cookies makes some websites unusable. For example,
shopping carts or login systems implemented using cookies do not work if cookies are
disabled.

Objective Type Questions

1. The server passes the method request and its parameters to the back end program
using a protocol called
a. TCP b. IP c.CGI d. HTTP

2. Which one of the following that acts as a conduit between the web client and a
back end program that does the actual work?

a. Web Server b. Web Browser

c. Server d. Data Server

3. The back-end program is placed in ------------

a. Web Server b.Web Browser

c. Server d. Data Server

4. Which one of the following is used to store information a user enters and resubmit
that information in subsequent forms without having the user reenter it?

a. CGI b.Web Client c. Hidden Fields d. Cookies

5. Which one of the following is act as variables that maintain state between form
submissions?

a. Environment Variables b. Web Client c.Hidden Fields d.Cookies


6. A small piece of text stored on a user's computer by a web browser

a. Environment Variables b.Web Client c. Hidden Fields d.Cookies

Review Questions
Two Mark Questions

1. Define CGI.
2. What is meant by Hidden fields and Cookies

Big Questions

1. How the client and server programs play together to process a form’s request?

SQL Database Server


(i) The fundamentals of SQL and Relational databases

SQL (Structured Query Language) is a database computer language designed for


managing data in relational database management systems (RDBMS). Its scope includes
data query and update, schema creation and modification, and data access control.

SQL’s Relational Origins

SQL consists of a set of commands that can be used to manipulate information collected
in tables. Through SQL, you manipulate and control set of records at a time. The
relational model calls for a clear separation of the physical aspects of data from their
logical representation. Data is made to appear as simple tables that mask the complexity
of the storage access mechanisms.

What does SQL do?

The SQL language is used to perform complex data operations with a few simple
commands in situations that would have required 100 of lines of conventional code. The
lists of roles are:

 SQL is an interactive query language for adhoc database queries


 SQL is a database programming language

 SQL is a data definition and data administration language

 SQL is the language of networked database servers

 SQL helps protect the data in a multi-user networked environment

The ISO standards: SQL-89, SQL-92 and SQL3

1. SQL-89

The Progress Structured Query Language (Progress/SQL-89) is a relational database


language based on the 1989 SQL standard of the American Standards Institute (ANSI). It
meets the Level 2 requirement of the standard. In addition, Progress/SQL-89 supports
many Progress features.

2. SQL-92

SQL-92 was the third revision of the SQL database query language. Unlike SQL-89, it
was a major revision of the standard. For all but a few minor incompatibilities, the SQL-
89 standard is forwards-compatible with SQL-92. The new features are:

 SQL Agent
 New data types defined: DATE, TIME, TIMESTAMP, INTERVAL, BIT string,
VARCHAR strings, and NATIONAL CHARACTER strings.
 Support for additional character sets beyond the base requirement for representing
SQL statements.
 New scalar operations such as string concatenation, date and time mathematics,
and conditional statements.
 New set operations such as UNION JOIN, NATURAL JOIN, set differences, and
set intersections.
 Support for alterations of schema definitions via ALTER and DROP.
 Bindings for C, Ada, and MUMPS.
 New features for user privileges.
 New integrity-checking functionality such as within a CHECK constraint.
 New schema definitions for "Information".
 Dynamic execution of queries (as opposed to prepared).
 Better support for remote database access.
 Temporary tables.
 Transaction isolation levels.
 New operations for changing data types on the fly via CAST.
 Scrolling cursors.
 Compatibility flagging for backwards and forwards compatibility with other SQL
standards.
 Call Level Interface

3. SQL3

SQL3 is an extension of the SQL standard that supports object-relational extensions.


SQL3 is being processed as both an ANSI Domestic ("D") project and as an ISO project.
The parts of SQL3 that provide the primary basis for supporting object-orientedstructures
are:

 user-defined types (ADTs, named row types, anddistinct types)


 type constructors for row types and reference types
 type constructors for collection types (sets, lists, and multisets)
 user-defined functions and procedures
 support for large objects (BLOBs and CLOBs)

One of the basic ideas behind the object facilities is that, in additionto the normal built-in
types defined by SQL, user-defined types may alsobe defined. These types may be used
in the same way as built-in types.For example, columns in relational tables may be
defined as taking valuesof user-defined types, as well as built-in types. A user-defined
abstractdata type (ADT) definition encapsulates attributes and operations ina single
entity. In SQL3, an abstract data type (ADT) is defined by specifyinga set of declarations
of the stored attributes that represent the valueof the ADT, the operations that define the
equality and ordering relationshipsof the ADT, and the operations that define the behavior
(and any virtualattributes) of the ADT. Operations are implemented by procedures
calledroutines. ADTs can also be defined as subtypes of other ADTs. Asubtype inherits
the structure and behavior of its supertypes (multipleinheritance is supported). Instances
of ADTs can be persistently storedin the database only by storing them in columns of
tables. In late 1998, SQL consists of 9 parts:

1. SQL/Framework
2. SQL/Foundation

3. SQL/CLI(Callable Level Interface)

4. SQL/PSM(Persistent Storage Modules)

5. SQL/Bindings

6. SQL/Transactions

7. SQL/Temporal

8. SQL/Med

9. SQL/OLB

(ii) What does a database server do?

 A client application usually requests data and data related services from a
database server. The database server, also known as the SQL engine,, responds to
the client’s request and provides secured access to shared data.
 A client application with a single SQL statement, retrieve and modify a set
of server database records. The SQL database engine can filter the query result sets,
resulting in considerable data communication savings.
 SQL server manages the control and execution of SQL commands. It
provides the logical and physical views of the data and generates optimized access
plans for executing the SQL commands.
 A database server also maintains dynamic catalog tables that contain
information about the SQL objects.
 SQL server allows multiple applications to access the same database at the
same time; it must provide an environment that protects the database against a
variety of possible internal and external threats.
 The server manages the recovery, concurrency, security and consistency
aspects of a database.

SQL Database Server Architectures


1. Process-per-client architectures
2. Multithreaded architectures
3. Hybrid architectures

1. Process-per-client architectures

It provides maximum bullet proofing by giving each database client its own process
address space. The database runs in one or more separate background processes. The
advantage of this architecture is that it protects the users from each other, and it protects
the database manager from the users. In addition, the processes can easily be assigned to
different processors on a multiprocessor SMP machine. Because the architecture relies on
the local OS for its multitasking services, an OS that supports SMP can transparently
assign processes to a pool of available processors. The disadvantage of this architecture
is that it consumes more memory and CPU resources that the alternative schemes. It can
be slower because the process context switches and inter process communications
overhead. These problems can be overcome with the use of TP monitor that managing a
pool of reusable processes. Example architectures are: DB2, Informix and Oracle6

2. Multithreaded architectures
It provides the best performance by running all the user connections, application the
database in the same address space. It provides its own internal scheduler and does not
rely on the local OS tasking and address protection schemes. The advantage is that it
conserves memory and CPU cycles by not requiring frequent context switches and it does
not require as many local OS services. The disadvantage is that a misbehaved user
application can bring down the entire database server and all its tasks. Example
architectures are: Sybase and MS SQL Server

3. Hybrid architectures
It consists of 3 components: 1) multithreaded network listeners that participate in the
initial connection task by assigning the client to a dispatcher; 2) dispatcher tasks that take
place messages on an internal message queue, and then de-queue the response and send it
back to the client; and 3) reusable shared server worker processes that pick the work off
the queue, execute it, and place the response on an out queue. The advantage is that it
provides a protected environment for running the user tasks without assigning a
permanent process to each user. The disadvantages are queue latencies. Example
architecture is: Oracle7
(iii) Stored Procedures, Triggers and Rules

Stored Procedure

A stored procedure is a set of SQL commands that has been compiled and stored on the
database server. Once the stored procedure has been "stored", client applications can
execute the stored procedure over and over again without sending it to the database server
again and without compiling it again. Stored procedures improve performance by
reducing network traffic and CPU load. The Benefits of Stored Procedures are:
 Precompiled execution. SQL Server compiles each stored procedure once and
then reutilizes the execution plan. This results in tremendous performance boosts
when stored procedures are called repeatedly.
 Reduced client/server traffic. If network bandwidth is a concern in your
environment, you'll be happy to learn that stored procedures can reduce long SQL
queries to a single line that is transmitted over the wire.
 Efficient reuse of code and programming abstraction. Stored procedures can
be used by multiple users and client programs. If you utilize them in a planned
manner, you'll find the development cycle takes less time.
 Enhanced security controls. You can grant users permission to execute a stored
procedure independently of underlying table permissions.
Static SQL and Dynamic SQL

A static SQL statement involves the preparation and storing of a section at preprocessing
time and the execution of that stored section at run time. A dynamic SQL statement
involves the preparation and execution of a section at runtime. Some statements do not
require a section, and they are also classified as dynamic.

Each type of statement has advantages and disadvantages as listed below:

 A static statement performs more efficiently than the equivalent dynamic


statement.
 In order to execute a static statement, a program module (containing a stored
section for the statement) must be installed in each DB Environment in which the
statement is to run. A dynamically preprocessed statement is portable and can be
run in any DB Environment without installing a program module.
 Dynamic statements may be more complex to code than are static statements.

Feature Stored Procedure Remote SQL

Embedded static Embedded


dynamic
No
Named functions Yes No
Yes Yes No
Persistently stored
on server
Yes Yes No
Tracked in catalog

Procedural Logic Within the object External External

Flexibility Low Low High

Abstraction level High Low Low


Yes Yes
Standard Coming with
SQL3

Performance Fast Slow


No
Tool-friendly Medium Yes
No
Client/Server Yes Yes
shrink-wrap friendly
One request/reply One request/reply
Network Messages One request/reply per many SQL per many SQL
commands commands
for many SQL
commands

Drawbacks of Stored Procedures

 They provide less adhoc flexibility than remote dynamic SQL.


 There is no transactional synchronization

 They are totally non-standard

Examples of Stored Procedure

 Sybase and MS SQL Server


 Oracle
 IBM’s DB2/UDB

 Informix

Triggers

 Triggers are special user-defined actions-usually in the form of stored procedures-


that are automatically invoked by the server based on data-related events.
 Triggers can perform complex actions and can use the full power of a procedure
language. A rule is a special type of trigger that is used to perform simple checks
on data.

 Both triggers and rules are attached to specific operations on specific tables such
as auditing, looking for value thresholds, or setting column database defaults.

 A separate trigger or rule can be defined for each of these commands, or a single
trigger may be defined for any updates to a table. Triggers can call other triggers
or stored procedures.

 Some example triggers are:

 Sybase

 MS SQL Server

 Ingres

 Oracle
 Informix

 DB2/UDB

SQL Middleware and Federated Databases


Middleware that is needed to make SQL clients and servers work across multi vendor,
heterogeneous, database networks-or more simply put federated databases. But we cannot
use this middleware to create production strength, federated databases. It provides an
adequate foundation for decision support systems and data warehousing.

(i) SQL Middleware: The options

Middleware starts with the API on the client side that is used to invoke a service, and it
covers the transmission of the request over the network and the resulting response. It does
not include the software that provides the actual service.

The Single-vendor Options

A typical single vendor middleware solution currently provides:

A vendor proprietary SQL API that works on a multiplicity of client platforms.


A vendor proprietary SQL driver

FAP support for multiple protocol stacks

Gateways to ther vendor databases

Client/Server database administration tools

Frond end graphical application development and query tools


The Multi-vendor Options

A typical Multi vendor middleware solution currently provides

Different SQL APIs


Multiple database drivers

Multiple FAPs and no interoperability

Multiple administration tools

Middleware Solutions
Step 1: To standardize on a common SQL Interface. The idea is to create a common SQL
API that is used by all the applications, and the let the server differences be handled by
the different database drivers.

Step 2: To standardize on one open industry FAPs, supply a common client driver for the
FAP and develop a gateway catcher for each server. The gateway catcher will catch the
incoming FAP messages and translate them to the local server’s native SQL interface.
Step 3: Remove the gateway catchers, which improves the performance, reduces cost,
and simplifies maintenance, and create a database administration interface. To eliminate
the gateway catcher, the common FAP must either support a superset of all the SQL or it
must tolerate native SQL. The vendors must also agre to replace their own private FAPs
with the common FAP.

(ii) SQL API

Two approaches for supporting SQL from within programming languages: embedded
SQL and SQL Call-Level Interface (CLI).
SQL-92 Embedded SQL (ESQL)
Embedded SQL is an ISO SQL-92 defined standard for embedded SQL statements “as is”
within ordinary programming languages. It specifies the syntax for embedding SQL
within C, COBOL and so on. Each SQL statements is flagged with language-specific
identifiers that mark the beginning and end of the SQL statement. It requires the SQL
source through a precompiler to generate a source code file that the language compiler
understands.
SQLJ- Java’s Embedded SQL
It is used to insert SQL statements inside the Java programs. It will be used to write Java-
based stored procedures.
SQL Call-Level Interface
It is a callable SQL API for database access. It does not require a precompiler to convert
SQL statements into code, which is then compiled and bound to a database. It allows
creating and executing SQL statements at run time.
X/Open SAG CLI
SQL Access Group (SAG) which was to provide a unified standard for remote database
access. It allows any SQL client talk to any SQL server. It requires the use of intelligent
database drivers that accept a CLI cal and translate it into the native database server’s
access language. With the proper driver, any data source application can function as a
CLI-server and can be accessed by the front end tools and client programs that use the
CLI. The CLI requires a driver fr each database to which it connects. Each driver must be
written for a specific server using the server’s access methods and network transport
stack. The CLI provides a driver manager that talks to a driver through a service provider
interface (SPI).
Microsoft ODBC CLI

CLI/ODBC is an SQL application programming interface that can be called by your


database applications. It passes dynamic SQL statements as database function calls.
Unlike embedded SQL it does not require host variables or a precompiler. When an
application program calls CLI/ODBC, the first thing that it must do is to make SQL calls
to some of the system catalog tables on the target database in order to obtain information
about other database contents. CLI/ODBC applications always access the system catalog
tables in this way. There are ten API calls that may be made in order to gather
information about the database that is being connected to.

ODBC 3.5
Microsoft introduces ODBC 3.5-a Unicode enabled the ODBC standard. It has many
drawbacks:
 It is constantly evolving
 Uncertain
 It introduces different programming paradigm
 It is object based rather than procedure based.
 ODBC drivers are difficult to build and maintain.
 Not well documented.
 High overhead.
CLI Vs Embedded SQL
Feature SQL/CLI Embedded SQL
Require target database to No Yes
be known ahead of time
Supports static SQL No Yes
Support dynamic SQL Yes Yes
Compile time type checking No Yes
Uses the SQL declarative No Yes
model
Applications must be No Yes
precompiled and bound to
database server
Easy of program No Yes
Tool-friendly Yes No
Easy to debug Yes No
Easy to package Yes No
Supports database- Yes No
independent catalog tables
Yes No
Supports database-
independent metadata

Object CLI: JDBC and OLE DB

The Java Database Connectivity (JDBC) API is the industry standard for database-
independent connectivity between the Java programming language and a wide range of
databases – SQL databases and other tabular data sources, such as spreadsheets or flat
files. The JDBC API provides a call-level API for SQL-based database access.

JDBC technology allows you to use the Java programming language to exploit "Write
Once, Run Anywhere" capabilities for applications that require access to enterprise data.
With a JDBC technology-enabled driver, you can connect all corporate data even in a
heterogeneous environment.
JDBC Drivers

 Connect your Java applications to databases


 Compile, deploy, and access data using your custom-built JDBC driver
 Enhance the custom JWDriver with advanced logging, connection pooling, and
Predefined Data Sets

JDBC URL Naming

A JDBC URL provides a way of identifying a database so that the appropriate driver will
recognize it and establish a connection with it. Driver writers are the ones who actually
determine what the JDBC URL that identifies their particular driver will be. Users do not
need to worry about how to form a JDBC URL; they simply use the URL supplied with
the drivers they are using. JDBC's role is to recommend some conventions for driver
writers to follow in structuring their JDBC URLs. Since JDBC URLs are used with
various kinds of drivers, the conventions are of necessity very flexible.
-> First, they allow different drivers to use different schemes for naming databases. The
odbc subprotocol, for example, lets the URL contain attribute values -> Second, JDBC
URLs allow driver writers to encode all necessary connection information within them.
This makes it possible, for example, for an applet that wants to talk to a given database to
open the database connection without requiring the user to do any system administration
chores.
-> Third, JDBC URLs allow a level of indirection. This means that the JDBC URL may
refer to a logical host or database name that is dynamically translated to the actual name
by a network naming system. This allows system administrators to avoid specifying
particular hosts as part of the JDBC name. There are a number of different network name
services (such as DNS, NIS, and DCE), and there is no restriction about which ones can
be used. The standard syntax for JDBC URLs is shown below. It has three parts, which
are separated by colons:
jdbc::
The three parts of a JDBC URL are broken down as follows:
1.jdbc-the protocol. The protocol in a JDBC URL is always jdbc.
2.-the name of the driver or the name of a database connectivity mechanism, which
may be supported by one or more drivers. A prominent example of a subprotocol name is
"odbc", which has been reserved for URLs that specify ODBC-style data source
names. For example, to access a database through a JDBC-ODBC bridge, one might use
a URL such as the following:
jdbc:odbc:fred
In this example, the subprotocol is "odbc", and the subname "fred" is a local
ODBC data source.
3.-a way to identify the database. The subname can vary, depending on the
subprotocol, and it can have a subsubname with any internal syntax the driver writer
chooses. The point of a subname is to give enough information to locate the database.
included in the JDBC URL as part of the subname and should follow the standard

JDBC 2-Tier and 3-Tier

The JDBC API supports both two-tier and three-tier processing models for database
access.

Figure 1: Two-tier Architecture for Data Access.


In the two-tier model, a Java application talks directly to the data source. This requires a
JDBC driver that can communicate with the particular data source being accessed. A
user's commands are delivered to the database or other data source, and the results of
those statements are sent back to the user. The data source may be located on another
machine to which the user is connected via a network. This is referred to as a client/server
configuration, with the user's machine as the client, and the machine housing the data
source as the server. The network can be an intranet, which, for example, connects
employees within a corporation, or it can be the Internet.

In the three-tier model, commands are sent to a "middle tier" of services, which then
sends the commands to the data source. The data source processes the commands and
sends the results back to the middle tier, which then sends them to the user. MIS directors
find the three-tier model very attractive because the middle tier makes it possible to
maintain control over access and the kinds of updates that can be made to corporate data.
Another advantage is that it simplifies the deployment of applications. Finally, in many
cases, the three-tier architecture can provide performance advantages.

Figure 2: Three-tier Architecture for Data Access.


OLE DB and ADO

OLE DB (Object Linking and Embedding, Database, sometimes written as OLEDB or


OLE-DB) is an API designed by Microsoft for accessing different types of data stored in
a uniform manner. It is a set of interfaces implemented using the Component Object
Model (COM); it is otherwise unrelated to OLE. It was designed as a higher-level
replacement for, and successor to, ODBC, extending its feature set to support a wider
variety of non-relational databases, such as object databases and spreadsheets that do not
necessarily implement SQL.

ADO is a higher level programming model for OLD DB. It is a replacement of two data
access protocols-Data Access Object and Remote Data Objects. It supports a
variety of front end tools and programming languages and also it provides a
Remote Data Service component that supports client-side caching and data ware
controls. It consists of connection objects that represent a connection to a data
source, command objects that represent a query to be executed on the data source,
and a record set object that represents the results of the query.

(iii) Open SQL Gateways

Whenever you run Open Server Gateway for SQL Anywhere, you provide a server name
on the command line that identifies the Open Server Gateway to client applications. This
server name must be recognized by client applications in order for them to communicate
with the Open Server Gateway.

IBI EDA/SQL

Enterprise-wide Data Access which access to up-to-date, complete information is


critical to making informed business decisions. But accessing that information can be
difficult --- particularly if your data scattered among different systems, stored in different
file formats, and seemingly impossible to merge.

Oracle Transparent Gateways give you complete access to your information --- enabling
data distributed across a variety of storage systems to appear as if within a single, local
database and the components are: API/SQL, EDA /Extenders, EDA/Link EDA/Server
and EDA/Data Drivers

IBM’s DRDA

Distributed Relational Database Architecture(TM) (DRDA) is a set of protocols that permits


multiple database systems, both IBM(R) and non-IBM, as well as application
programs, to work together. Any combination of relational database management
products that use DRDA(R) can be connected to form a distributed relational
database management system. DRDA coordinates communication between systems
by defining what must be exchanged and how it must be exchanged.

The four levels of transactions defined by DRDA:


 Remote Request
 Remote unit of work

 Distributed unit of work

 Distributed Request

DRDA Features

1. SQL Message Content and Exchange Protocol


2. Transport stack Independence

3. Multiplatform Program Preparation’

4. Static or Dynamic SQL support

5. Common Diagnostic

6. Common SQL Syntax

Objective Type Questions

1. Which one of the following is a relational database language based on standard of


the American Standards Institute (ANSI)?
a. SQL-89 b. SQL-92 c. SQL3 d. SQL5
2. Which server also maintains dynamic catalog tables that contain information
about the SQL objects?
a. Database Server b.Data Server c.SQL Server d.Web Server
3. Which server manages the control and execution of SQL commands. It provides
the logical and physical views of the data and generates optimized access plans for
executing the SQL commands.
a. Database Server b.Data Server c.SQL Server d.Web Server
4. Which type or architecture that protects the users from each other, and it protects
the database manager from the users?
a. Process-per-client architectures
b. Multithreaded architectures
c. Hybrid architectures
d. 3-tier Client/Server architectures
5. Which type or architecture that consumes more memory and CPU resources that
the alternative schemes?
a. Process-per-client architectures
b. Multithreaded architectures
c. Hybrid architectures
d. 3-tier Client/Server architectures
6. Informix is an example of
a. Process-per-client architectures
b. Multithreaded architectures
c. Hybrid architectures
d. 3-tier Client/Server architectures
7. Which type or architecture that conserves memory and CPU cycles by not
requiring frequent context switches and it does not require as many local OS
services?
a. Process-per-client architectures
b. Multithreaded architectures
c. Hybrid architectures
d. 3-tier Client/Server architectures
8. Which type or architecture that misbehaved user application can bring down the
entire database server and all its tasks?
a. Process-per-client architectures
b. Multithreaded architectures
c. Hybrid architectures
d. 3-tier Client/Server architectures
9. Sybase is an example of
a. Process-per-client architectures
b. Multithreaded architectures
c. Hybrid architectures
d. 3-tier Client/Server architectures
10. MS SQL Server is an example of
a. Process-per-client architectures
b. Multithreaded architectures
c. Hybrid architectures
d. 3-tier Client/Server architectures
11. DB2 is an example of
a. Process-per-client architectures
b. Multithreaded architectures
c. Hybrid architectures
d. 3-tier Client/Server architectures
12. Which type or architecture that provides a protected environment for running
the user tasks without assigning a permanent process to each user
a. Process-per-client architectures
b. Multithreaded architectures
c. Hybrid architectures
d. 3-tier Client/Server architectures
13. Which type or architecture, there is a queue latencies?
a. Process-per-client architectures
b. Multithreaded architectures
c. Hybrid architectures
d. 3-tier Client/Server architectures
14. Oracle7 is an example of
a. Process-per-client architectures
b. Multithreaded architectures
c. Hybrid architectures
d. 3-tier Client/Server architectures
15. A set of SQL commands that has been compiled and stored on the database
server
a. Stored Procedures b.Triggers c.Rules d.Transaction
16. The client/server traffic will be reduced in
a. stored Procedures b.Triggers c.Rules d.None of the above
17. The transactional synchronization is not available in
a. stored Procedures b.Triggers c.Rules d.None of the above

18. Oracle is one of the

a. Stored Procedures b.Triggers


c. Rules d.Stored Procedures & Triggers

Review Questions

Two Mark Questions

1. Define SQL.
2. What are the advantages and disadvantages of Process-per client
architecture?

3. What are the advantages and disadvantages of Mulithreaded architecture?

4. What are the advantages and disadvantages of Hybrid architecture?

5. Write short notes on SQL-92, SQL-89 abd SQL3.

6. Define stored procedures, Triggers and rules.

7. What are the advantages and disadvantages of tored procedures.

8. Define static SQL and dynamic SQL.

9. Compare stored procedures with staic and dynamic SQL.

10. Define API.

11. What is meant by FAP?


12. Define Embedded SQL.

13. Compare Embedded SQL and Call-Level Interface.

14. Define JDBC.

15. What is the use of DRDA and write the features of DRDA?

Big Questions

1. Explain in detail the Middleware and Federated databases.


2. Explain the SQL database architecture with neat diagram.

Data Warehouses

(i) OLTP

Databases tend to get split up into a variety of diffrent catagoies based on their
application and requirements. All of these diffrent catagories naturally get nifty buzz
words to help classify them and make distinctions in features more apparent. The most
popular buzz work (well, acronymn anyway) is OLTP or Online Transaction
Proccessing. Other classifications include Descision Support Systems (DSS), Data
Warehouses, Data Marts, etc.

OLTP databases, as the name implies, handle real time transactions which inherently have
some special requirements. If your running a store, for instance, you need to ensure that
as people order products they are properly and effiently updating the inventory tables
while they are updating the purchases tables, while their updating the customer tables, so
on and so forth. OLTP databases must be atomic in nature (an entire transaction either
succeeds or fails, there is no middle ground), be consistant (each transaction leaves the
affected data in a consistant and correct state), be isolated (no transaction affects the
states of other transactions), and be durable (changes resulting from commited
transactions are persistant). All of this can be a fairly tall order but is essential to running
a successful OLTP database.
(ii) Decision Support Systems

A Decision Support System (DSS) is an interactive computer-based system or subsystem


intended to help decision makers use communications technologies, data, documents,
knowledge and/or models to identify and solve problems, complete decision process
tasks, and make decisions. Decision Support System is a general term for any computer
application that enhances a person or group’s ability to make decisions. Also, Decision
Support Systems refers to an academic field of research that involves designing and
studying Decision Support Systems in their context of use. In general, Decision Support
Systems are a class of computerized information system that support decision-making
activities. Five more specific Decision Support System types include:

 Communications-driven DSS
 Data-driven DSS
 Document-driven DSS
 Knowledge-driven DSS
 Model-driven DSS

(iii) Executive Information System

An Executive Information System (EIS) is a set of management tools supporting the


information and decision-making needs of management by combining information
available within the organisation with external information in an analytical framework.

The emphasis of the system as a whole is the easy to use interface and the integration
with a variety of data sources. It offers strong reporting and data mining capabilities
which can provide all the data the executive is likely to need. Traditionally the interface
was menu driven with either reports, or text presentation. Newer systems, and especially
the newer Business Intelligence systems, which are replacing EIS, have a dashboard or
scorecard type display.

Executive Information Systems come in two distinct types: ones that are data driven, and
ones that are model driven. Data driven systems interface with databases and data
warehouses. They collate information from different sources and presents them to the
user in an integrated dashboard style screen. Model driven systems use forecasting,
simulations and decision tree like processes to present the data.

Advantages of EIS
 Easy for upper-level executives to use, extensive computer experience is not
required in operations
 Provides timely delivery of company summary information
 Information that is provided is better understood
 Filters data for management
 Improves to tracking information
 Offers efficiency to decision makers

Disadvantages of EIS
 Limited functionality, by design
 Information overload for some managers
 Benefits hard to quantify
 High implementation costs
 System may become slow, large, and hard to manage
 Need good internal processes for data management
 May lead to less reliable and less secure data
 System dependent

Comparing Decision Support and OLTP Systems

Feature DSS OLTP


Who uses it? Production workers Information hounds
Timeliness of data Needs current value of data.Needs stable snapshots of
Reports cannot be time stamped data. Reports
reconstructed can be reconstructed using
stable data
Frequency of data access Continuous throughout Sporadic
workday.
Data format Raw captured data. No Multiple levels of
derived data. Detailed and conversions, filtering,
summarized transaction summarization,
data condensation and extraction
Data collection From single application From multiple sources-
internal and external
Data source known? Yes No
Timed snapshots or multiple No Yes
versions?
Data access pattern Multiple users updating Mostly, single user access.
production database
Can data be updated? Current value is Read only, unless you own
continuously updated the replica
Flexibility of access Inflexible Very flexible

Performance Fast response time is a Relatively slow


requirement. Highly
automated, repetitive tasks
Data requirements Well understood Fuzzy and unstable. A lot of
detective work and
discovery. Subject oriented
data
Information scope Finite Data can come from
anywhere
Average number of records Les than 10 individual 100s to 1000s of records in
accessed records sets

(iv)Data Warehouses

"A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of


data in support of management's decision making process". Data warehouse is a
repository of an organization's electronically stored data. Data warehouses are designed
to facilitate reporting and analysis. A data warehouse houses a standardized, consistent,
clean and integrated form of data sourced from various operational systems in use in the
organization, structured in a way to specifically address the reporting and analytic
requirements.

The Elements of Data Warehousing


1. The Data replication manager
Manages the copying and distribution of data across databases as defined by the
information users. The users defines the data that needs to be copied, the source and
destination platforms. Update and data transforms. Refresh involves copying over the
entire data source; update only generates the changes.
2. The informational database
Is Database that organizes and stores copies of data from multiple data sources? We can
assume a decision support server that transform, aggregates and add values to data from
various sources. It also stores metadata, System level and semantic level metadata.
3. The information directory
It is an amalgam of functions of a technical directory business directory and information
navigator. Its main function is to help the information users to find out what data is
available on the different databases. What format it is in and how to access it. It also
helps the DBAs to manage the data warehouse. The information directory gets its
metadata by discovering which databases are on the network and the querying their
metadata repositories. DBA use the information directory to access system level metadata
keep track of data sources, data targets, cleanup rules, transformation rules and details
about predefined rules and reports.
4.DSS tool support
Is provided via SQL most vendors support ODBC and some other protocol. In summary
DBA must be able to assemble data from different sources, replicate it, clean it , store it,
catalog it and the make it available to DSS tools. Data mining is one of them that refer
loosely to finding relevant information from a large volume of data. Data mining attempts
to discover pre defined/user defined rules & pattern automatically from data.

Warehouse Hierarchies: The Data marts

A data mart is a repository of data gathered from operational data and other sources that
is designed to serve a particular community of knowledge workers. In scope, the data
may derive from an enterprise-wide database or data warehouse or be more specialized.
The emphasis of a data mart is on meeting the specific demands of a particular group of
knowledge users in terms of analysis, content, presentation, …

Data marts are the "corner stores" of the enterprise, and each unique knowledge worker
community has its own mart maintained by the divisional or departmental IS group.
Some divisions may need only a single data mart if all knowledge workers in the division
have similar information requirements. In other cases, a departmental IS organization will
discover several distinct knowledge worker communities within a single department of a
division.

Each data mart serves only its local community, and is modeled on the information needs
of that community. For example, managers of consumer products will require different
information than managers of industrial products (raw material). Consumer products have
a complex competitive dimension for which syndicated market information (from
companies such as Information Resources Inc. and Nielsen Marketing Research) exists,
while industrial products have a simpler competitive dimension. Consumer products are
sold over the counter with no advance notice of purchasing, while industrial products are
sold in large lots over a longer period on the basis of existing relationships and contracts.
Also, consumer products are sold through channels not controlled by the manufacturer,
while industrial products are supplied directly by their manufacturers. These two
communities, both composed of product managers, have different information
requirements.

Replication Versus Direct Access

A data warehouse has clear advantages in reporting: Data can be accessed more quickly
because it has been stored in the way best suited to analysis. Not only are the system
parameters for displaying data optimized, but suitable indexes, aggregates and joins are
also stored in the database. Historical data can also be accessed and consistent analyses
using several operative systems and even external data (for example via the Web) are
possible. This advantage of the data warehouse is won at great cost, in that additional
systems (with additional administrative expense) and additional memory are needed, and
that the data is, for performance reasons, usually not as up-to-date as in the operative
system.

Direct data access is required by applications-mostly production OLTP-that cannot


tolerate and volatility in their data. These applications require “live data” that reflects the
state of the business. And it can be obtained using any on of the 4 approaches:

1. Using federated databases that support synchronous replication of data


2. Using a centralized database server

3. Using a single vendor’s distributed database multi server offering

4. Using a TP monitor to front end multi vendor database server


This has the following advantages: Firstly it ensures that user interfaces are identical (or
nearly identical). Secondly, a uniform data model can be accessed. Thirdly, operational
business is not so burdened by reporting. Fourthly, as a rule, reporting programs can be
replaced more frequently by new, improved versions than operative systems can. In this
case, direct access is also improved.

The Mechanics of Replication

 Extract data using a query


 Copy the results to a diskette file to the machine with the spreadsheet program

 Import the file into the local database

(i) Refresh and Updates

Refresh -> Replaces the entire target with data from the source

Update -> It only sends the changed data to the target. Updates can be either
synchronous, which means that the target copy is updated in the same commit scope
as the source table, or they can be asynchronous, which means that the target table is
updated in a separate transaction than the one updating the source.

(ii) Staging the updates


It provides users with a consistent view of the data across the copies. It reduces
contention on the production database-the copy tool does not interfere with
production applications.

(iii) Cleansing and transforming the raw data

 subsets allow you to transmit only the rows and columns that are of interest to
your informational applications

 Aggregates allow you to transmit only the aggregations of data such as averages,
sums, maximums and so on.

 Derived functions allow you to specify data that does not exist but it is the result
of some calculation on data that does exist.

Benefits of data warehousing

Some of the benefits that a data warehouse provides are as follows:

A data warehouse provides a common data model for all data of interest regardless of the
data's source. This makes it easier to report and analyze information than it would be if
multiple data models were used to retrieve information such as sales invoices, order
receipts, general ledger charges, etc.

 Prior to loading data into the data warehouse, inconsistencies are identified and
resolved. This greatly simplifies reporting and analysis.
 Information in the data warehouse is under the control of data warehouse users so
that, even if the source system data is purged over time, the information in the
warehouse can be stored safely for extended periods of time.
 Because they are separate from operational systems, data warehouses provide
retrieval of data without slowing down operational systems.
 Data warehouses can work in conjunction with and, hence, enhance the value of
operational business applications, notably customer relationship management
(CRM) systems.
 Data warehouses facilitate decision support system applications such as trend
reports (e.g., the items with the most sales in a particular area within the last two
years), exception reports, and reports that show actual performance versus goals.

Disadvantages of data warehouses

There are also disadvantages to using a data warehouse. Some of them are:

 Data warehouses are not the optimal environment for unstructured data.
 Because data must be extracted, transformed and loaded into the warehouse, there
is an element of latency in data warehouse data.
 Over their life, data warehouses can have high costs. The data warehouse is
usually not static. Maintenance costs are high.
 Data warehouses can get outdated relatively quickly. There is a cost of delivering
suboptimal information to the organization.
 There is often a fine line between data warehouses and operational systems.
Duplicate, expensive functionality may be developed. Or, functionality may be
developed in the data warehouse that, in retrospect, should have been developed
in the operational systems and vice versa.

EIS/DSS: From Queries, To OLAP, To Data Mining


(i) Query/Reporting Tools
Query/Reporting Tools are formulating a query without writing a program or learning
SQL. You point and click to generate the SELECT statements and the search criteria. The
tool then displays results in some understandable form usually a table. The tool is
powerful enough to maintain multiple connections to different databases or data sources
at the same time. This can effectively reduce the time requirements for Developers,
analysts, and DBAs to provide information to end users that are still a bit shy about
querying data sources and producing their own reports.
(ii) OLAP and Multidimensional Data

OLAP allows business users to slice and dice data at will. Normally data in an
organization is distributed in multiple data sources and are incompatible with each other.
A retail example: Point-of-sales data and sales made via call-center or the Web are stored
in different location and formats. Part of the OLAP implementation process involves
extracting data from the various data repositories and making them compatible. Making
data compatible involves ensuring that the meaning of the data in one repository matches
all other repositories.

It is not always necessary to create a data warehouse for OLAP analysis. Data stored by
operational systems, such as point-of-sales, are in types of databases called OLTPs.
OLTP, Online Transaction Process, databases do not have any difference from a
structural perspective from any other databases. The main difference and only, difference
is the way in which data is stored.

Examples of OLTPs can include ERP, CRM, SCM, Point-of-Sale applications, Call
Center. OLTPs are designed for optimal transaction speed. When a consumer makes a
purchase online, they expect the transactions to occur instantaneously. With a database
design, call data modeling, optimized for transactions the record 'Consumer name,
Address, Telephone, Order Number, Order Name, Price, Payment Method' is created
quickly on the database and the results can be recalled by managers equally quickly if
needed.
Figure 1. Data Model for OLTP

Data are not typically stored for an extended period on OLTPs for storage cost and
transaction speed reasons. OLAPs have a different mandate from OLTPs. OLAPs are
designed to give an overview analysis of what happened. Hence the data storage (i.e. data
modeling) has to be set up differently. The most common method is called the star
design. The central table in an OLAP start data model is called the fact table. The
surrounding tables are called the dimensions. Using the above data model, it is possible to
build reports that answer questions such as:

 The supervisor that gave the most discounts.


 The quantity shipped on a particular date, month, year or quarter.
 In which zip code did product A sell the most.
Figure 2. Star Data Model for OLAP

To obtain answers, such as the ones above, from a data model OLAP cubes are created.
OLAP cubes are not strictly cuboids - it is the name given to the process of linking data
from the different dimensions. The cubes can be developed along business units such as
sales or marketing. Or a giant cube can be formed with all the dimensions. OLAP can be
a valuable and rewarding business tool. Aside from producing reports, OLAP analysis
can aid an organization evaluate balanced scorecard targets.

Figure 3. OLAP Cube with Time, Customer and Product Dimensions

Types of OLAP
1) Desktop OLAP

Desktop OLAP, or “DOLAP” is based on the idea that a user can download a section of
the data from the database or source, and work with that dataset locally, or on their
desktop. DOLAP is easier to deploy and has a cheaper cost but comes with a very limited
functionality in comparison with other OLAP applications.

2) Multidimensional OLAP

Multidimensional OLAP, with a popular acronym of MOLAP, is widely regarded as the


classic form of OLAP. One of the major distinctions of MOLAP against a ROLAP tool is
that data are pre-summarized and are stored in an optimized format in a multidimensional
cube, instead of in a relational database. In this type of model, data are structured into
proprietary formats in accordance with a client’s reporting requirements with the
calculations pre-generated on the cubes.

This is probably by far, the best OLAP tool to use in making analysis reports since this
enables users to easily reorganize or rotate the cube structure to view different aspects of
data. This is done by way of slicing and dicing. MOLAP analytic tool are also capable of
performing complex calculations. Since calculations are predefined upon cube creation,
this results in the faster return of computed data. MOLAP systems also provide users the
ability to quickly write back data into a data set. Moreover, in comparison to ROLAP,
MOLAP is considerably less heavy on hardware due to compression techniques. In a
nutshell, MOLAP is more optimized for fast query performance and retrieval of
summarized information.

There are certain limitations to implementation of a MOLAP system, one primary


weakness of which is that MOLAP tool is less scalable than a ROLAP tool as the former
is capable of handling only a limited amount of data. The MOLAP approach also
introduces data redundancy. There are also certain MOLAP products that encounter
difficulty in updating models with dimensions with very high cardinality.

3) Relational OLAP
ROLAP or “Relational” OLAP systems work primarily from the data that resides in a
relational database, where the base data and dimension tables are stored as relational
tables. This model permits multidimensional analysis of data as this enables users to
perform a function equivalent to that of the traditional OLAP slicing and dicing feature.
This is achieved thorough use of any SQL reporting tool to extract or ‘query’ data directly
from the data warehouse. Wherein specifying a ‘Where clause’ equals performing a
certain slice and dice action.

One advantage of ROLAP over the other styles of OLAP analytic tools is that it is
deemed to be more scalable in handling huge amounts of data. ROLAP sits on top of
relational databases therefore enabling it to leverage several functionalities that a
relational database is capable of. Another gain of a ROLAP tool is that it is efficient in
managing both numeric and textual data. It also permits users to “drill down” to the leaf
details or the lowest level of a hierarchy structure.

However, ROLAP applications display a slower performance as compared to other style


of OLAP tools since, oftentimes, calculations are performed inside the server. Another
demerit of a ROLAP tool is that as it is dependent on use of SQL for data manipulation, it
may not be ideal for performance of some calculations that are not easily translatable into
an SQL query.

4) Hybrid OLAP

HOLAP is the product of the attempt to incorporate the best features of MOLAP and
ROLAP into a single architecture. This tool tried to bridge the technology gap of both
products by enabling access or use to both multidimensional database (MDDB) and
Relational Database Management System (RDBMS) data stores. HOLAP systems stores
larger quantities of detailed data in the relational tables while the aggregations are stored
in the pre-calculated cubes. HOLAP also has the capacity to “drill through” from the cube
down to the relational tables for delineated data.
Some of the advantages of this system are better scalability, quick data processing and
flexibility in accessing of data sources.

OLAP Client/Server Interaction


The steps are:
 The client invokes the OLAP application and submits a command
 The server executes the command
 The server returns the results to the client
 The client optionally caches the results

(iii)Data Mining

Data mining, the extraction of hidden predictive information from large databases, is a
powerful new technology with great potential to help companies focus on the most
important information in their data warehouses. Data mining tools predict future trends
and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The
automated, prospective analyses offered by data mining move beyond the analyses of past
events provided by retrospective tools typical of decision support systems. Data mining
tools can answer business questions that traditionally were too time consuming to
resolve. They scour databases for hidden patterns, finding predictive information that
experts may miss because it lies outside their expectations.

Most companies already collect and refine massive quantities of data. Data mining
techniques can be implemented rapidly on existing software and hardware platforms to
enhance the value of existing information resources, and can be integrated with new
products and systems as they are brought on-line.

How does data mining work?

While large-scale information technology has been evolving separate transaction and
analytical systems, data mining provides the link between the two. Data mining software
analyzes relationships and patterns in stored transaction data based on open-ended user
queries. Several types of analytical software are available: statistical, machine learning,
and neural networks. Generally, any of four types of relationships are sought:

 Classes: Stored data is used to locate data in predetermined groups. For example,
a restaurant chain could mine customer purchase data to determine when
customers visit and what they typically order. This information could be used to
increase traffic by having daily specials.

 Clusters: Data items are grouped according to logical relationships or consumer


preferences. For example, data can be mined to identify market segments or
consumer affinities.

 Associations: Data can be mined to identify associations. The beer-diaper


example is an example of associative mining.

 Sequential patterns: Data is mined to anticipate behavior patterns and trends. For
example, an outdoor equipment retailer could predict the likelihood of a backpack
being purchased based on a consumer's purchase of sleeping bags and hiking
shoes.

Data mining consists of five major elements:


 Extract, transform, and load transaction data onto the data warehouse system.

 Store and manage the data in a multidimensional database system.

 Provide data access to business analysts and information technology


professionals.

 Analyze the data by application software.

 Present the data in a useful format, such as a graph or table.

Client/Server Groupware
Client/Server Groupware is a collection of technologies that allow us to represent
complex processes that center around collaborative human activities. It built 5 foundation
technologies: multimedia document management, workflow, email, conferencing and
scheduling. Groupware is software that supports the creation, flow and tracking of non-
structured information in direct support of collaborative group activity.
Components of Groupware
 Multimedia document management
 Workflow
 Email
 Group Conferencing
 Group Scheduling.
(i) Multimedia document management
The fundamental issues faced by multimedia database management researchers and
designers are as follows:
• Development of models for capturing media synchronization requirements.
• Development of conceptual models for multimedia information, especially for video,
audio, and image data. These models should be rich in their semantic capabilities for
abstraction of multimedia information, be able to provide canonical representations of
complex images, scenes, and events in terms of objects and their spatiotemporal behavior.
• Design of powerful indexing, searching, accessing, and organizing methods for
multimedia data. Search in multimedia databases can be quite computationally intensive,
especially if content-based retrieval is needed for image and video data stored in
compressed or uncompressed form. Occasionally, searches may be fuzzy or based on
incomplete information. Some form of classification/grouping of information may be
needed to help the search process.
• Design of efficient multimedia query languages. These languages should be capable of
expressing complex spatiotemporal concepts, allow imprecise match retrieval, and be
able to handle various manipulation functions for multimedia objects.
• Development of efficient data clustering and storage layout schemes to manage real-
time multimedia data, for both single and parallel disk systems.
• Design and development of a suitable architecture and operating system support for a
general-purpose database management system.
• Management of distributed multimedia data and coordination for composition of
multimedia data over a network.
(ii) Workflow

The automation of a business process, in whole or part, during which documents,


information or tasks are passed from one participant to another for action, according to a
set of procedural rules. The Evolution of Workflow Management consists of the
automation of business procedures or "workflows" during which documents, information
or tasks are passed from one participant to another in a way that is governed by rules or
procedures.

Workflow software products, like other software technologies, have evolved from diverse
origins. While some offerings have been developed as pure workflow software,
many have evolved from image management systems, document management
systems, relational or object database systems, and electronic mail systems.

Vendors who have developed pure workflow offerings have invented terms and
interfaces, while vendors who have evolved products from other technologies
have often adapted terminology and interfaces. Each approach offers a variety of
strengths from which a user can choose. Adding a standards based approach
allows a user to combine these strengths in one infrastructure.
The New Workflow System
The new workflow packages go beyond their imaginh counterparts in the following area:
(i) Support for adhoc user needs
(ii) Low Cost
(iii) Integration with other applications
(iv) Programming with visual metaphors
(v) Integration with e-mail, MOM, ORB, publish-and-subscribe
(vi) Provide facilities for tracking work-in-progress
(vii) Provide users with tolls to complete an action
(viii) Provide APIs that let developers coutomize workflow services
(ix) Provide off0-the-self component suoites for aseembilng workflow
(x) Integration with LDA directories
Workflow models
 Routes
 Rules
 Roles
Workflow Routes
Sequential Routing

Parallel Routing

Conditional Routing
Iterative Routing

AND split

An example of parallel routing where several tasks are performed in


parallel or in no particular order. It is modeled by a transition with one
input place and two or more output places. When fired the transition
will create tokens in all output places.

AND join

A transition with two or more input places and one output place. This
will only be enabled once there is a token in all of the input places,
which would be after each parallel thread of execution has finished.

Explicit OR split

An example of conditional routing where the decision is made as


early as possible. It is modeled by attaching conditions or guards to
the arcs going out of a transition.

Guard - An expression attached to an arc, shown in brackets, that


evaluates to true or false. Tokens can only travel over arcs when their
guard evaluates to true. The expression will typically involve the case
attributes.

Implicit OR split
An example of conditional routing where the decision is made as late
as possible. Implicit or-splits are modeled as two arcs going from the
same place but to different transitions. That way, the transition that
happens to fire first (which depends on the transition trigger) will get
the token. Once the token is gone, the others are no longer enabled
and thus cannot be fired.

One of the transitions must have a timer as its trigger so that it will be
fired if the other transition is not activated before the time limit
expires. Expired transitions can either be triggered automatically via a
background process which is running on a timer (e.g. cron), or
manually via an online screen.

OR join (explicit and implicit)

Is simply a place that serves as the output place of two different


transitions. That way, the next transition after the or-join place will be
enabled when either of the two conditional threads are done.

Workflow Reference Model

The architecture identifies the major components and interfaces. These are considered in
turn in the following sections. As far as possible, the detail of the individual interfaces
(APIs and interchange formats) will be developed as a common core set using additional
parameters as necessary to cope with individual requirements of particular interfaces. The
interface around the workflow enactment service is designated WAPI - Workflow APIs
and Interchange formats, which may be considered as a set of constructs by which the
services of the workflow system may be accessed and which regulate the interactions
between the workflow control software and other system components. Many of the
functions within the 5 interface areas are common to two or more interface services hence
it is more appropriate to consider WAPI as a unified service interface which is used to
support workflow management functions across the 5 functional areas, rather than 5
individual interfaces.
(iii) E-mail
E-mail is an electronic message sent from one device to another. While most messages go
from computer to computer, e-mail can also be sent and received by mobile phones,
PDAs and other portable devices. With e-mail, you can send and receive personal and
business-related messages with attachments, such as photos and documents. You can also
send music, podcasts, video clips and software programs.
E-mail passes from one computer, known as a mail server, to another as it travels over
the Internet. Once it arrives at the destination mail server, it's stored in an electronic
mailbox until the recipient retrieves it. This whole process can take seconds, allowing you
to quickly communicate with people around the world at any time of the day or night.
To receive e-mail, you need an account on a mail server. This is similar to having a postal
box where you receive letters. One advantage over regular mail is that you can retrieve
your e-mail from any location on earth, provide that you have Internet access. Once you
connect to your mail server, you download your messages to your computer or wireless
device, or read them online.

(iv)Group Conferencing
Group conferencing is a program based on restorative justice principles. It is a problem-
solving approach to offending that aims to balance the needs of young people, victims
and the community by encouraging dialogue between individuals who have offended and
their victims.
The conference process provides all participants, in particular the young person and the
victim, with the opportunity to tell their story of the offence and how it has affected them.
At the end of the information sharing process, participants provide suggestions about how
the young person might repair the harm caused to the victim. This sets the expectations
for the outcome plan. The convenor takes an active role in negotiating an agreement with
all participants regarding the content of the conference outcome plan in order to ensure
the proposed plan is fair and reasonable, is realistic, and not more onerous than what the
Court would have imposed upon the young person.
(v)Group Scheduling
Manage your company's schedules online with eStudio's calendar software. With our
online calendar, you and your employees will keep track of tasks, appointments, events,
and meetings with ease and efficiency. Users will receive notifications and reminders via
email. Managers can create supervisor reports to easily check staff availability. No matter
how big your team or how daunting your schedule, eStudio 6 is the group calendar
software that can manage everyone's schedule.

OBJECTIVE TYPE QUESTIONS

1. Mining or extracting the knowledge from the databases are called

a. data warehouse

b. data mining

c. classification

d. clustering

2. Data Warehouse is a

a. subject oriented, time-variant, non-volatile collections of data

b. subject oriented, time-variant, volatile collections of data


c. subject oriented, time-variant, non-volatile, integrated collections of data

d. subject oriented, integrated, non-volatile collections of data

3. Removing the noise from the large databases is called

a. Cleaning b. transformation c. integration d. materialization

4. Translating the data from one form to other form is called

a. Cleaning b. transformation c. integration d. materialization

5. Which one of the following are elements of data warehousing?


a. data mining b. DSS c. EIS Replication Manager
6. Mapping from low level concepts to higher level concepts is called
a. data marts b. concept hierarchies c. data cubes d. cleaning
7. Data about the data is called
a. data marts b. meta data c. data cubes d. cleaning
8. The part of the data warehouse is called
a. data marts b. meta data c. data cubes d. cleaning
9. Which one of the following that provides layer of abtsration on top of
databases that hides the physical structure of normalized relational tables?
a. ROLAP b. MOLAP c. HOLAP d. DOLAP
10. Which one of the following that provides specialized database engines to store
data in arrays along related dimensions called hypercube?
a. ROLAP b. MOLAP c. HOLAP d. DOLAP
11. Which one of the following that provides multidimensional data access on top
of relational databases?
a. ROLAP b. MOLAP c. HOLAP d. DOLAP
12. The software that supports the creation, flow and tracking of non-structured
information in direct support of collaborative group activity is called
a. Groupware b. Middleware c. data mining d. association
13. Which one of the following is not a grouware component?
a. e-mail b. multimedia document c. workflow d. DSS
Review Questions
Two Mark Questions
1. Defien Data warehouse.
2. Defien data marts and data cubes.
3. What are the elements of data warehousing?
4. Define OLAP and OLTP.
5. Compare OLAP and OLTP.
6. What is meant by data mining.
7. What are the applications of data mining?
8. Define groupware.
9. What are the components of groupware?
10. Define workflow.
11. What is the difference between groupware an SQL databse server?
Big Questions
1. Define Groupware. What are the components of Groupware?
2. Explain with neat diagram Data Warehousing architecture.
3. Explain in detail about OLAP and multidimensional data.
4. What are the elements of data warehousing?
5. Explain in detail about the applications of data mining.

ASSIGNMENT QUESTIONS

1. Develop a college webpage using HTML.


2. Write a HTML program that uses cascading style sheet.

1. Explain in detail the applications of groupware.


2. What are the applications of data mining?

1. Write a program to implement Banking system using SQL.


2. Explain the architecture of data mining.

Vous aimerez peut-être aussi