Académique Documents
Professionnel Documents
Culture Documents
INTRODUCTION
A Web form is an HTML page with one or more data entry fields an a mandatory
“submit” button. You click on the submit button to send the form’s data contents
to a web server. This causes the browser to collect all the inputs from the form,
stuff them inside an HTTP message, and then invoke either an HTTP GET or
POST method on the server side.
Hidden fields are basically invisible; they contain values that are not displayed
within a form. They are used to store information a user enters and resubmit that
information in subsequent forms without having the user reenter it or even be
aware that the information is being passed around.
A cookie (also tracking cookie, browser cookie, and HTTP cookie) is a small
piece of text stored on a user's computer by a web browser.
A stored procedure is a set of SQL commands that has been compiled and stored
on the database server.
Triggers are special user-defined actions-usually in the form of stored
procedures-that are automatically invoked by the server based on data-related
events.
A rule is a special type of trigger that is used to perform simple checks on data.
A data mart is a repository of data gathered from operational data and other
sources that is designed to serve a particular community of knowledge workers.
Groupware is software that supports the creation, flow and tracking of non-
structured information in direct support of collaborative group activity. The
Components are Multimedia document management, Workflow, Email, Group
Conferencing, Group Scheduling.
CONTENTS
The backend program executes the request and returns the results in HTML
formats to the web server using CGI protocol. The web server treats the results
like a normal document that it returns to the client. So the web server acts as a
conduit between the web client and a back end program that does the actual
work.
The figure shows the elements of this new 3-tier client/server architecture web
style. The first tier is a web browser that supports interactive forms: the second
tier is a vanilla HTTP server augmented with CGI programs: and the third tier
consists of traditional back end servers.
CGI technology makes it possible for Internet clients to update databases on back
end servers. In fact, updates and inserts are at the heart of online electronic
commerce.
Software vendors are using the technology of SSL, HTTP, IPSec and Internet
Firewalls to connect web browsers to virtually every form of client/server
system-including SQL databases, TP Monitors, Groupware Servers, ERP
systems, MOM queues, e-mail backbones and ORBs.
All these systems provide gateways that accept CGI requests and they provide
transformers to dynamically map their data into HTML, so that it can be
displayed within web browsers
The form’s data gets passed using an end-to-end client/server protocol that includes both
HTTP and CGI. The best way to explain the dynamics of the protocol is to walk you
through a POST method invocation.
A CGI Scenario
The figure shows how the client and server programs play together to process a form;s
request. The steps are:
1. User Clicks on the form’s “submit” button
This causes the web browser to collect the data within the form, and then assemble it into
one long string of name/value pairs each separated by an &. The browser translates
spaces within the data into (+) symbols.
This is an ordinary HTTP request that specifies a POST method, the URL of the target
program in the directory and the typical HTTP headers. The message body-HTTP calls it
the entity-contains the form’s data.
The server parses the message and discovers that it’s a POST for the program. So it starts
a CGI interaction.
The HTTP server executes an instance of the CGI program specified in the URL. It’s
typically in the program.
In this case, the program discovers by reading the environment variables that it is
responding to a POST.
7. The CGI program reads the environment body via the standard input
pipe (stdin)
The message body contains the string of name=value items separated by &. The
content_length environment variable tells the program how mush data is in the string.
The CGI program parses the string contents to retrieve the form data. It uses the
content_length environment variables to determine how many characters to read in from
the standard input pipe.
Typically, a CGI program interacts with some back end resource-like a DBMS or
transaction program-to service the form’s request. The CGI program must then format the
results in HTML or some other acceptable MIME type. This information goes into the
HTTP response entity, which really is the body of the message. The program can also
choose to provide all the information that goes into the HTTP response headers. The
HTTP server will then end the reply “as is” to the client. Because it removes extra
overhead of having the HTTP server parse the output to create the response headers.
Programs whose names begin with “npg-“indicate that they do not require HTTP server
assistance: CGI calls them nonparsed header programs (nph).
9. The CGI program returns the results via the standard output pipe
(stdout)
The program pipes back the results to the HTTP server via its standard output. The HTTP
server receives the results on its standard input. This concludes the CGI interaction.
10. The HTTP server returns the results to the Web Browser
The HTTP server can either append some response headers to the information it receives
from the CGI program, or it sends it “as is” if it’s an nph program.
The server forgets everything after it hands over a reply to the client. The Internet is
always some “kludge” that you can use to work around problems.
A Hidden field is an ordinary input filed inside a form that is marked with the
attribute HIDDEN.
Hidden fields are basically invisible; they contain values that are not displayed
within a form. They are used to store information a user enters and resubmit that
information in subsequent forms without having the user reenter it or even be
aware that the information is being passed around.
The hidden fields act as variables that maintain state between form submissions. It
gets passed through the CGI program.
The figure shows an electronic transaction requires multiple form submission,
which leads to an electronic payment. The first set of forms pick the merchandise
that you place on your electronic shopping cart. The next form requests a delivery
address and time. The last form presents you with the bill and request some form
of payment.
The CGI program processes a form and then presents you with the next form; this
continues until you make your final payment or abort transaction.
The CGI program uses invisible fields to store information from the previous
form in the next form.
When you submit the form, all these hidden fields are passed right back to the
CGI program along with any new data.
The CGI program stores the state of the transaction in the forms it sends back to
the client instead of storing in its own memory.
(ii) Cookies
A cookie (also tracking cookie, browser cookie, and HTTP cookie) is a small piece of
text stored on a user's computer by a web browser. A cookie consists of one or more
name-value pairs containing bits of information such as user preferences, shopping cart
contents, the identifier for a server-based session, or other data used by websites.
It is sent as an HTTP header by a web server to a web browser and then sent back
unchanged by the browser each time it accesses that server. A cookie can be used for
authenticating, session tracking (state maintenance), and remembering specific
information about users, such as site preferences or the contents of their electronic
shopping carts. The term "cookie" is derived from "magic cookie", a well-known concept
in UNIX computing which inspired both the idea and the name of browser cookies. Some
alternatives to cookies exist; each has its own uses, advantages, and drawbacks.
Most modern browsers allow users to decide whether to accept cookies, and the time
frame to keep them, but rejecting cookies makes some websites unusable. For example,
shopping carts or login systems implemented using cookies do not work if cookies are
disabled.
1. The server passes the method request and its parameters to the back end program
using a protocol called
a. TCP b. IP c.CGI d. HTTP
2. Which one of the following that acts as a conduit between the web client and a
back end program that does the actual work?
4. Which one of the following is used to store information a user enters and resubmit
that information in subsequent forms without having the user reenter it?
5. Which one of the following is act as variables that maintain state between form
submissions?
Review Questions
Two Mark Questions
1. Define CGI.
2. What is meant by Hidden fields and Cookies
Big Questions
1. How the client and server programs play together to process a form’s request?
SQL consists of a set of commands that can be used to manipulate information collected
in tables. Through SQL, you manipulate and control set of records at a time. The
relational model calls for a clear separation of the physical aspects of data from their
logical representation. Data is made to appear as simple tables that mask the complexity
of the storage access mechanisms.
The SQL language is used to perform complex data operations with a few simple
commands in situations that would have required 100 of lines of conventional code. The
lists of roles are:
1. SQL-89
2. SQL-92
SQL-92 was the third revision of the SQL database query language. Unlike SQL-89, it
was a major revision of the standard. For all but a few minor incompatibilities, the SQL-
89 standard is forwards-compatible with SQL-92. The new features are:
SQL Agent
New data types defined: DATE, TIME, TIMESTAMP, INTERVAL, BIT string,
VARCHAR strings, and NATIONAL CHARACTER strings.
Support for additional character sets beyond the base requirement for representing
SQL statements.
New scalar operations such as string concatenation, date and time mathematics,
and conditional statements.
New set operations such as UNION JOIN, NATURAL JOIN, set differences, and
set intersections.
Support for alterations of schema definitions via ALTER and DROP.
Bindings for C, Ada, and MUMPS.
New features for user privileges.
New integrity-checking functionality such as within a CHECK constraint.
New schema definitions for "Information".
Dynamic execution of queries (as opposed to prepared).
Better support for remote database access.
Temporary tables.
Transaction isolation levels.
New operations for changing data types on the fly via CAST.
Scrolling cursors.
Compatibility flagging for backwards and forwards compatibility with other SQL
standards.
Call Level Interface
3. SQL3
One of the basic ideas behind the object facilities is that, in additionto the normal built-in
types defined by SQL, user-defined types may alsobe defined. These types may be used
in the same way as built-in types.For example, columns in relational tables may be
defined as taking valuesof user-defined types, as well as built-in types. A user-defined
abstractdata type (ADT) definition encapsulates attributes and operations ina single
entity. In SQL3, an abstract data type (ADT) is defined by specifyinga set of declarations
of the stored attributes that represent the valueof the ADT, the operations that define the
equality and ordering relationshipsof the ADT, and the operations that define the behavior
(and any virtualattributes) of the ADT. Operations are implemented by procedures
calledroutines. ADTs can also be defined as subtypes of other ADTs. Asubtype inherits
the structure and behavior of its supertypes (multipleinheritance is supported). Instances
of ADTs can be persistently storedin the database only by storing them in columns of
tables. In late 1998, SQL consists of 9 parts:
1. SQL/Framework
2. SQL/Foundation
5. SQL/Bindings
6. SQL/Transactions
7. SQL/Temporal
8. SQL/Med
9. SQL/OLB
A client application usually requests data and data related services from a
database server. The database server, also known as the SQL engine,, responds to
the client’s request and provides secured access to shared data.
A client application with a single SQL statement, retrieve and modify a set
of server database records. The SQL database engine can filter the query result sets,
resulting in considerable data communication savings.
SQL server manages the control and execution of SQL commands. It
provides the logical and physical views of the data and generates optimized access
plans for executing the SQL commands.
A database server also maintains dynamic catalog tables that contain
information about the SQL objects.
SQL server allows multiple applications to access the same database at the
same time; it must provide an environment that protects the database against a
variety of possible internal and external threats.
The server manages the recovery, concurrency, security and consistency
aspects of a database.
1. Process-per-client architectures
It provides maximum bullet proofing by giving each database client its own process
address space. The database runs in one or more separate background processes. The
advantage of this architecture is that it protects the users from each other, and it protects
the database manager from the users. In addition, the processes can easily be assigned to
different processors on a multiprocessor SMP machine. Because the architecture relies on
the local OS for its multitasking services, an OS that supports SMP can transparently
assign processes to a pool of available processors. The disadvantage of this architecture
is that it consumes more memory and CPU resources that the alternative schemes. It can
be slower because the process context switches and inter process communications
overhead. These problems can be overcome with the use of TP monitor that managing a
pool of reusable processes. Example architectures are: DB2, Informix and Oracle6
2. Multithreaded architectures
It provides the best performance by running all the user connections, application the
database in the same address space. It provides its own internal scheduler and does not
rely on the local OS tasking and address protection schemes. The advantage is that it
conserves memory and CPU cycles by not requiring frequent context switches and it does
not require as many local OS services. The disadvantage is that a misbehaved user
application can bring down the entire database server and all its tasks. Example
architectures are: Sybase and MS SQL Server
3. Hybrid architectures
It consists of 3 components: 1) multithreaded network listeners that participate in the
initial connection task by assigning the client to a dispatcher; 2) dispatcher tasks that take
place messages on an internal message queue, and then de-queue the response and send it
back to the client; and 3) reusable shared server worker processes that pick the work off
the queue, execute it, and place the response on an out queue. The advantage is that it
provides a protected environment for running the user tasks without assigning a
permanent process to each user. The disadvantages are queue latencies. Example
architecture is: Oracle7
(iii) Stored Procedures, Triggers and Rules
Stored Procedure
A stored procedure is a set of SQL commands that has been compiled and stored on the
database server. Once the stored procedure has been "stored", client applications can
execute the stored procedure over and over again without sending it to the database server
again and without compiling it again. Stored procedures improve performance by
reducing network traffic and CPU load. The Benefits of Stored Procedures are:
Precompiled execution. SQL Server compiles each stored procedure once and
then reutilizes the execution plan. This results in tremendous performance boosts
when stored procedures are called repeatedly.
Reduced client/server traffic. If network bandwidth is a concern in your
environment, you'll be happy to learn that stored procedures can reduce long SQL
queries to a single line that is transmitted over the wire.
Efficient reuse of code and programming abstraction. Stored procedures can
be used by multiple users and client programs. If you utilize them in a planned
manner, you'll find the development cycle takes less time.
Enhanced security controls. You can grant users permission to execute a stored
procedure independently of underlying table permissions.
Static SQL and Dynamic SQL
A static SQL statement involves the preparation and storing of a section at preprocessing
time and the execution of that stored section at run time. A dynamic SQL statement
involves the preparation and execution of a section at runtime. Some statements do not
require a section, and they are also classified as dynamic.
Informix
Triggers
Both triggers and rules are attached to specific operations on specific tables such
as auditing, looking for value thresholds, or setting column database defaults.
A separate trigger or rule can be defined for each of these commands, or a single
trigger may be defined for any updates to a table. Triggers can call other triggers
or stored procedures.
Sybase
MS SQL Server
Ingres
Oracle
Informix
DB2/UDB
Middleware starts with the API on the client side that is used to invoke a service, and it
covers the transmission of the request over the network and the resulting response. It does
not include the software that provides the actual service.
Middleware Solutions
Step 1: To standardize on a common SQL Interface. The idea is to create a common SQL
API that is used by all the applications, and the let the server differences be handled by
the different database drivers.
Step 2: To standardize on one open industry FAPs, supply a common client driver for the
FAP and develop a gateway catcher for each server. The gateway catcher will catch the
incoming FAP messages and translate them to the local server’s native SQL interface.
Step 3: Remove the gateway catchers, which improves the performance, reduces cost,
and simplifies maintenance, and create a database administration interface. To eliminate
the gateway catcher, the common FAP must either support a superset of all the SQL or it
must tolerate native SQL. The vendors must also agre to replace their own private FAPs
with the common FAP.
Two approaches for supporting SQL from within programming languages: embedded
SQL and SQL Call-Level Interface (CLI).
SQL-92 Embedded SQL (ESQL)
Embedded SQL is an ISO SQL-92 defined standard for embedded SQL statements “as is”
within ordinary programming languages. It specifies the syntax for embedding SQL
within C, COBOL and so on. Each SQL statements is flagged with language-specific
identifiers that mark the beginning and end of the SQL statement. It requires the SQL
source through a precompiler to generate a source code file that the language compiler
understands.
SQLJ- Java’s Embedded SQL
It is used to insert SQL statements inside the Java programs. It will be used to write Java-
based stored procedures.
SQL Call-Level Interface
It is a callable SQL API for database access. It does not require a precompiler to convert
SQL statements into code, which is then compiled and bound to a database. It allows
creating and executing SQL statements at run time.
X/Open SAG CLI
SQL Access Group (SAG) which was to provide a unified standard for remote database
access. It allows any SQL client talk to any SQL server. It requires the use of intelligent
database drivers that accept a CLI cal and translate it into the native database server’s
access language. With the proper driver, any data source application can function as a
CLI-server and can be accessed by the front end tools and client programs that use the
CLI. The CLI requires a driver fr each database to which it connects. Each driver must be
written for a specific server using the server’s access methods and network transport
stack. The CLI provides a driver manager that talks to a driver through a service provider
interface (SPI).
Microsoft ODBC CLI
ODBC 3.5
Microsoft introduces ODBC 3.5-a Unicode enabled the ODBC standard. It has many
drawbacks:
It is constantly evolving
Uncertain
It introduces different programming paradigm
It is object based rather than procedure based.
ODBC drivers are difficult to build and maintain.
Not well documented.
High overhead.
CLI Vs Embedded SQL
Feature SQL/CLI Embedded SQL
Require target database to No Yes
be known ahead of time
Supports static SQL No Yes
Support dynamic SQL Yes Yes
Compile time type checking No Yes
Uses the SQL declarative No Yes
model
Applications must be No Yes
precompiled and bound to
database server
Easy of program No Yes
Tool-friendly Yes No
Easy to debug Yes No
Easy to package Yes No
Supports database- Yes No
independent catalog tables
Yes No
Supports database-
independent metadata
The Java Database Connectivity (JDBC) API is the industry standard for database-
independent connectivity between the Java programming language and a wide range of
databases – SQL databases and other tabular data sources, such as spreadsheets or flat
files. The JDBC API provides a call-level API for SQL-based database access.
JDBC technology allows you to use the Java programming language to exploit "Write
Once, Run Anywhere" capabilities for applications that require access to enterprise data.
With a JDBC technology-enabled driver, you can connect all corporate data even in a
heterogeneous environment.
JDBC Drivers
A JDBC URL provides a way of identifying a database so that the appropriate driver will
recognize it and establish a connection with it. Driver writers are the ones who actually
determine what the JDBC URL that identifies their particular driver will be. Users do not
need to worry about how to form a JDBC URL; they simply use the URL supplied with
the drivers they are using. JDBC's role is to recommend some conventions for driver
writers to follow in structuring their JDBC URLs. Since JDBC URLs are used with
various kinds of drivers, the conventions are of necessity very flexible.
-> First, they allow different drivers to use different schemes for naming databases. The
odbc subprotocol, for example, lets the URL contain attribute values -> Second, JDBC
URLs allow driver writers to encode all necessary connection information within them.
This makes it possible, for example, for an applet that wants to talk to a given database to
open the database connection without requiring the user to do any system administration
chores.
-> Third, JDBC URLs allow a level of indirection. This means that the JDBC URL may
refer to a logical host or database name that is dynamically translated to the actual name
by a network naming system. This allows system administrators to avoid specifying
particular hosts as part of the JDBC name. There are a number of different network name
services (such as DNS, NIS, and DCE), and there is no restriction about which ones can
be used. The standard syntax for JDBC URLs is shown below. It has three parts, which
are separated by colons:
jdbc::
The three parts of a JDBC URL are broken down as follows:
1.jdbc-the protocol. The protocol in a JDBC URL is always jdbc.
2.-the name of the driver or the name of a database connectivity mechanism, which
may be supported by one or more drivers. A prominent example of a subprotocol name is
"odbc", which has been reserved for URLs that specify ODBC-style data source
names. For example, to access a database through a JDBC-ODBC bridge, one might use
a URL such as the following:
jdbc:odbc:fred
In this example, the subprotocol is "odbc", and the subname "fred" is a local
ODBC data source.
3.-a way to identify the database. The subname can vary, depending on the
subprotocol, and it can have a subsubname with any internal syntax the driver writer
chooses. The point of a subname is to give enough information to locate the database.
included in the JDBC URL as part of the subname and should follow the standard
The JDBC API supports both two-tier and three-tier processing models for database
access.
In the three-tier model, commands are sent to a "middle tier" of services, which then
sends the commands to the data source. The data source processes the commands and
sends the results back to the middle tier, which then sends them to the user. MIS directors
find the three-tier model very attractive because the middle tier makes it possible to
maintain control over access and the kinds of updates that can be made to corporate data.
Another advantage is that it simplifies the deployment of applications. Finally, in many
cases, the three-tier architecture can provide performance advantages.
ADO is a higher level programming model for OLD DB. It is a replacement of two data
access protocols-Data Access Object and Remote Data Objects. It supports a
variety of front end tools and programming languages and also it provides a
Remote Data Service component that supports client-side caching and data ware
controls. It consists of connection objects that represent a connection to a data
source, command objects that represent a query to be executed on the data source,
and a record set object that represents the results of the query.
Whenever you run Open Server Gateway for SQL Anywhere, you provide a server name
on the command line that identifies the Open Server Gateway to client applications. This
server name must be recognized by client applications in order for them to communicate
with the Open Server Gateway.
IBI EDA/SQL
Oracle Transparent Gateways give you complete access to your information --- enabling
data distributed across a variety of storage systems to appear as if within a single, local
database and the components are: API/SQL, EDA /Extenders, EDA/Link EDA/Server
and EDA/Data Drivers
IBM’s DRDA
Distributed Request
DRDA Features
5. Common Diagnostic
Review Questions
1. Define SQL.
2. What are the advantages and disadvantages of Process-per client
architecture?
15. What is the use of DRDA and write the features of DRDA?
Big Questions
Data Warehouses
(i) OLTP
Databases tend to get split up into a variety of diffrent catagoies based on their
application and requirements. All of these diffrent catagories naturally get nifty buzz
words to help classify them and make distinctions in features more apparent. The most
popular buzz work (well, acronymn anyway) is OLTP or Online Transaction
Proccessing. Other classifications include Descision Support Systems (DSS), Data
Warehouses, Data Marts, etc.
OLTP databases, as the name implies, handle real time transactions which inherently have
some special requirements. If your running a store, for instance, you need to ensure that
as people order products they are properly and effiently updating the inventory tables
while they are updating the purchases tables, while their updating the customer tables, so
on and so forth. OLTP databases must be atomic in nature (an entire transaction either
succeeds or fails, there is no middle ground), be consistant (each transaction leaves the
affected data in a consistant and correct state), be isolated (no transaction affects the
states of other transactions), and be durable (changes resulting from commited
transactions are persistant). All of this can be a fairly tall order but is essential to running
a successful OLTP database.
(ii) Decision Support Systems
Communications-driven DSS
Data-driven DSS
Document-driven DSS
Knowledge-driven DSS
Model-driven DSS
The emphasis of the system as a whole is the easy to use interface and the integration
with a variety of data sources. It offers strong reporting and data mining capabilities
which can provide all the data the executive is likely to need. Traditionally the interface
was menu driven with either reports, or text presentation. Newer systems, and especially
the newer Business Intelligence systems, which are replacing EIS, have a dashboard or
scorecard type display.
Executive Information Systems come in two distinct types: ones that are data driven, and
ones that are model driven. Data driven systems interface with databases and data
warehouses. They collate information from different sources and presents them to the
user in an integrated dashboard style screen. Model driven systems use forecasting,
simulations and decision tree like processes to present the data.
Advantages of EIS
Easy for upper-level executives to use, extensive computer experience is not
required in operations
Provides timely delivery of company summary information
Information that is provided is better understood
Filters data for management
Improves to tracking information
Offers efficiency to decision makers
Disadvantages of EIS
Limited functionality, by design
Information overload for some managers
Benefits hard to quantify
High implementation costs
System may become slow, large, and hard to manage
Need good internal processes for data management
May lead to less reliable and less secure data
System dependent
(iv)Data Warehouses
A data mart is a repository of data gathered from operational data and other sources that
is designed to serve a particular community of knowledge workers. In scope, the data
may derive from an enterprise-wide database or data warehouse or be more specialized.
The emphasis of a data mart is on meeting the specific demands of a particular group of
knowledge users in terms of analysis, content, presentation, …
Data marts are the "corner stores" of the enterprise, and each unique knowledge worker
community has its own mart maintained by the divisional or departmental IS group.
Some divisions may need only a single data mart if all knowledge workers in the division
have similar information requirements. In other cases, a departmental IS organization will
discover several distinct knowledge worker communities within a single department of a
division.
Each data mart serves only its local community, and is modeled on the information needs
of that community. For example, managers of consumer products will require different
information than managers of industrial products (raw material). Consumer products have
a complex competitive dimension for which syndicated market information (from
companies such as Information Resources Inc. and Nielsen Marketing Research) exists,
while industrial products have a simpler competitive dimension. Consumer products are
sold over the counter with no advance notice of purchasing, while industrial products are
sold in large lots over a longer period on the basis of existing relationships and contracts.
Also, consumer products are sold through channels not controlled by the manufacturer,
while industrial products are supplied directly by their manufacturers. These two
communities, both composed of product managers, have different information
requirements.
A data warehouse has clear advantages in reporting: Data can be accessed more quickly
because it has been stored in the way best suited to analysis. Not only are the system
parameters for displaying data optimized, but suitable indexes, aggregates and joins are
also stored in the database. Historical data can also be accessed and consistent analyses
using several operative systems and even external data (for example via the Web) are
possible. This advantage of the data warehouse is won at great cost, in that additional
systems (with additional administrative expense) and additional memory are needed, and
that the data is, for performance reasons, usually not as up-to-date as in the operative
system.
Refresh -> Replaces the entire target with data from the source
Update -> It only sends the changed data to the target. Updates can be either
synchronous, which means that the target copy is updated in the same commit scope
as the source table, or they can be asynchronous, which means that the target table is
updated in a separate transaction than the one updating the source.
subsets allow you to transmit only the rows and columns that are of interest to
your informational applications
Aggregates allow you to transmit only the aggregations of data such as averages,
sums, maximums and so on.
Derived functions allow you to specify data that does not exist but it is the result
of some calculation on data that does exist.
A data warehouse provides a common data model for all data of interest regardless of the
data's source. This makes it easier to report and analyze information than it would be if
multiple data models were used to retrieve information such as sales invoices, order
receipts, general ledger charges, etc.
Prior to loading data into the data warehouse, inconsistencies are identified and
resolved. This greatly simplifies reporting and analysis.
Information in the data warehouse is under the control of data warehouse users so
that, even if the source system data is purged over time, the information in the
warehouse can be stored safely for extended periods of time.
Because they are separate from operational systems, data warehouses provide
retrieval of data without slowing down operational systems.
Data warehouses can work in conjunction with and, hence, enhance the value of
operational business applications, notably customer relationship management
(CRM) systems.
Data warehouses facilitate decision support system applications such as trend
reports (e.g., the items with the most sales in a particular area within the last two
years), exception reports, and reports that show actual performance versus goals.
There are also disadvantages to using a data warehouse. Some of them are:
Data warehouses are not the optimal environment for unstructured data.
Because data must be extracted, transformed and loaded into the warehouse, there
is an element of latency in data warehouse data.
Over their life, data warehouses can have high costs. The data warehouse is
usually not static. Maintenance costs are high.
Data warehouses can get outdated relatively quickly. There is a cost of delivering
suboptimal information to the organization.
There is often a fine line between data warehouses and operational systems.
Duplicate, expensive functionality may be developed. Or, functionality may be
developed in the data warehouse that, in retrospect, should have been developed
in the operational systems and vice versa.
OLAP allows business users to slice and dice data at will. Normally data in an
organization is distributed in multiple data sources and are incompatible with each other.
A retail example: Point-of-sales data and sales made via call-center or the Web are stored
in different location and formats. Part of the OLAP implementation process involves
extracting data from the various data repositories and making them compatible. Making
data compatible involves ensuring that the meaning of the data in one repository matches
all other repositories.
It is not always necessary to create a data warehouse for OLAP analysis. Data stored by
operational systems, such as point-of-sales, are in types of databases called OLTPs.
OLTP, Online Transaction Process, databases do not have any difference from a
structural perspective from any other databases. The main difference and only, difference
is the way in which data is stored.
Examples of OLTPs can include ERP, CRM, SCM, Point-of-Sale applications, Call
Center. OLTPs are designed for optimal transaction speed. When a consumer makes a
purchase online, they expect the transactions to occur instantaneously. With a database
design, call data modeling, optimized for transactions the record 'Consumer name,
Address, Telephone, Order Number, Order Name, Price, Payment Method' is created
quickly on the database and the results can be recalled by managers equally quickly if
needed.
Figure 1. Data Model for OLTP
Data are not typically stored for an extended period on OLTPs for storage cost and
transaction speed reasons. OLAPs have a different mandate from OLTPs. OLAPs are
designed to give an overview analysis of what happened. Hence the data storage (i.e. data
modeling) has to be set up differently. The most common method is called the star
design. The central table in an OLAP start data model is called the fact table. The
surrounding tables are called the dimensions. Using the above data model, it is possible to
build reports that answer questions such as:
To obtain answers, such as the ones above, from a data model OLAP cubes are created.
OLAP cubes are not strictly cuboids - it is the name given to the process of linking data
from the different dimensions. The cubes can be developed along business units such as
sales or marketing. Or a giant cube can be formed with all the dimensions. OLAP can be
a valuable and rewarding business tool. Aside from producing reports, OLAP analysis
can aid an organization evaluate balanced scorecard targets.
Types of OLAP
1) Desktop OLAP
Desktop OLAP, or “DOLAP” is based on the idea that a user can download a section of
the data from the database or source, and work with that dataset locally, or on their
desktop. DOLAP is easier to deploy and has a cheaper cost but comes with a very limited
functionality in comparison with other OLAP applications.
2) Multidimensional OLAP
This is probably by far, the best OLAP tool to use in making analysis reports since this
enables users to easily reorganize or rotate the cube structure to view different aspects of
data. This is done by way of slicing and dicing. MOLAP analytic tool are also capable of
performing complex calculations. Since calculations are predefined upon cube creation,
this results in the faster return of computed data. MOLAP systems also provide users the
ability to quickly write back data into a data set. Moreover, in comparison to ROLAP,
MOLAP is considerably less heavy on hardware due to compression techniques. In a
nutshell, MOLAP is more optimized for fast query performance and retrieval of
summarized information.
3) Relational OLAP
ROLAP or “Relational” OLAP systems work primarily from the data that resides in a
relational database, where the base data and dimension tables are stored as relational
tables. This model permits multidimensional analysis of data as this enables users to
perform a function equivalent to that of the traditional OLAP slicing and dicing feature.
This is achieved thorough use of any SQL reporting tool to extract or ‘query’ data directly
from the data warehouse. Wherein specifying a ‘Where clause’ equals performing a
certain slice and dice action.
One advantage of ROLAP over the other styles of OLAP analytic tools is that it is
deemed to be more scalable in handling huge amounts of data. ROLAP sits on top of
relational databases therefore enabling it to leverage several functionalities that a
relational database is capable of. Another gain of a ROLAP tool is that it is efficient in
managing both numeric and textual data. It also permits users to “drill down” to the leaf
details or the lowest level of a hierarchy structure.
4) Hybrid OLAP
HOLAP is the product of the attempt to incorporate the best features of MOLAP and
ROLAP into a single architecture. This tool tried to bridge the technology gap of both
products by enabling access or use to both multidimensional database (MDDB) and
Relational Database Management System (RDBMS) data stores. HOLAP systems stores
larger quantities of detailed data in the relational tables while the aggregations are stored
in the pre-calculated cubes. HOLAP also has the capacity to “drill through” from the cube
down to the relational tables for delineated data.
Some of the advantages of this system are better scalability, quick data processing and
flexibility in accessing of data sources.
(iii)Data Mining
Data mining, the extraction of hidden predictive information from large databases, is a
powerful new technology with great potential to help companies focus on the most
important information in their data warehouses. Data mining tools predict future trends
and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The
automated, prospective analyses offered by data mining move beyond the analyses of past
events provided by retrospective tools typical of decision support systems. Data mining
tools can answer business questions that traditionally were too time consuming to
resolve. They scour databases for hidden patterns, finding predictive information that
experts may miss because it lies outside their expectations.
Most companies already collect and refine massive quantities of data. Data mining
techniques can be implemented rapidly on existing software and hardware platforms to
enhance the value of existing information resources, and can be integrated with new
products and systems as they are brought on-line.
While large-scale information technology has been evolving separate transaction and
analytical systems, data mining provides the link between the two. Data mining software
analyzes relationships and patterns in stored transaction data based on open-ended user
queries. Several types of analytical software are available: statistical, machine learning,
and neural networks. Generally, any of four types of relationships are sought:
Classes: Stored data is used to locate data in predetermined groups. For example,
a restaurant chain could mine customer purchase data to determine when
customers visit and what they typically order. This information could be used to
increase traffic by having daily specials.
Sequential patterns: Data is mined to anticipate behavior patterns and trends. For
example, an outdoor equipment retailer could predict the likelihood of a backpack
being purchased based on a consumer's purchase of sleeping bags and hiking
shoes.
Client/Server Groupware
Client/Server Groupware is a collection of technologies that allow us to represent
complex processes that center around collaborative human activities. It built 5 foundation
technologies: multimedia document management, workflow, email, conferencing and
scheduling. Groupware is software that supports the creation, flow and tracking of non-
structured information in direct support of collaborative group activity.
Components of Groupware
Multimedia document management
Workflow
Email
Group Conferencing
Group Scheduling.
(i) Multimedia document management
The fundamental issues faced by multimedia database management researchers and
designers are as follows:
• Development of models for capturing media synchronization requirements.
• Development of conceptual models for multimedia information, especially for video,
audio, and image data. These models should be rich in their semantic capabilities for
abstraction of multimedia information, be able to provide canonical representations of
complex images, scenes, and events in terms of objects and their spatiotemporal behavior.
• Design of powerful indexing, searching, accessing, and organizing methods for
multimedia data. Search in multimedia databases can be quite computationally intensive,
especially if content-based retrieval is needed for image and video data stored in
compressed or uncompressed form. Occasionally, searches may be fuzzy or based on
incomplete information. Some form of classification/grouping of information may be
needed to help the search process.
• Design of efficient multimedia query languages. These languages should be capable of
expressing complex spatiotemporal concepts, allow imprecise match retrieval, and be
able to handle various manipulation functions for multimedia objects.
• Development of efficient data clustering and storage layout schemes to manage real-
time multimedia data, for both single and parallel disk systems.
• Design and development of a suitable architecture and operating system support for a
general-purpose database management system.
• Management of distributed multimedia data and coordination for composition of
multimedia data over a network.
(ii) Workflow
Workflow software products, like other software technologies, have evolved from diverse
origins. While some offerings have been developed as pure workflow software,
many have evolved from image management systems, document management
systems, relational or object database systems, and electronic mail systems.
Vendors who have developed pure workflow offerings have invented terms and
interfaces, while vendors who have evolved products from other technologies
have often adapted terminology and interfaces. Each approach offers a variety of
strengths from which a user can choose. Adding a standards based approach
allows a user to combine these strengths in one infrastructure.
The New Workflow System
The new workflow packages go beyond their imaginh counterparts in the following area:
(i) Support for adhoc user needs
(ii) Low Cost
(iii) Integration with other applications
(iv) Programming with visual metaphors
(v) Integration with e-mail, MOM, ORB, publish-and-subscribe
(vi) Provide facilities for tracking work-in-progress
(vii) Provide users with tolls to complete an action
(viii) Provide APIs that let developers coutomize workflow services
(ix) Provide off0-the-self component suoites for aseembilng workflow
(x) Integration with LDA directories
Workflow models
Routes
Rules
Roles
Workflow Routes
Sequential Routing
Parallel Routing
Conditional Routing
Iterative Routing
AND split
AND join
A transition with two or more input places and one output place. This
will only be enabled once there is a token in all of the input places,
which would be after each parallel thread of execution has finished.
Explicit OR split
Implicit OR split
An example of conditional routing where the decision is made as late
as possible. Implicit or-splits are modeled as two arcs going from the
same place but to different transitions. That way, the transition that
happens to fire first (which depends on the transition trigger) will get
the token. Once the token is gone, the others are no longer enabled
and thus cannot be fired.
One of the transitions must have a timer as its trigger so that it will be
fired if the other transition is not activated before the time limit
expires. Expired transitions can either be triggered automatically via a
background process which is running on a timer (e.g. cron), or
manually via an online screen.
The architecture identifies the major components and interfaces. These are considered in
turn in the following sections. As far as possible, the detail of the individual interfaces
(APIs and interchange formats) will be developed as a common core set using additional
parameters as necessary to cope with individual requirements of particular interfaces. The
interface around the workflow enactment service is designated WAPI - Workflow APIs
and Interchange formats, which may be considered as a set of constructs by which the
services of the workflow system may be accessed and which regulate the interactions
between the workflow control software and other system components. Many of the
functions within the 5 interface areas are common to two or more interface services hence
it is more appropriate to consider WAPI as a unified service interface which is used to
support workflow management functions across the 5 functional areas, rather than 5
individual interfaces.
(iii) E-mail
E-mail is an electronic message sent from one device to another. While most messages go
from computer to computer, e-mail can also be sent and received by mobile phones,
PDAs and other portable devices. With e-mail, you can send and receive personal and
business-related messages with attachments, such as photos and documents. You can also
send music, podcasts, video clips and software programs.
E-mail passes from one computer, known as a mail server, to another as it travels over
the Internet. Once it arrives at the destination mail server, it's stored in an electronic
mailbox until the recipient retrieves it. This whole process can take seconds, allowing you
to quickly communicate with people around the world at any time of the day or night.
To receive e-mail, you need an account on a mail server. This is similar to having a postal
box where you receive letters. One advantage over regular mail is that you can retrieve
your e-mail from any location on earth, provide that you have Internet access. Once you
connect to your mail server, you download your messages to your computer or wireless
device, or read them online.
(iv)Group Conferencing
Group conferencing is a program based on restorative justice principles. It is a problem-
solving approach to offending that aims to balance the needs of young people, victims
and the community by encouraging dialogue between individuals who have offended and
their victims.
The conference process provides all participants, in particular the young person and the
victim, with the opportunity to tell their story of the offence and how it has affected them.
At the end of the information sharing process, participants provide suggestions about how
the young person might repair the harm caused to the victim. This sets the expectations
for the outcome plan. The convenor takes an active role in negotiating an agreement with
all participants regarding the content of the conference outcome plan in order to ensure
the proposed plan is fair and reasonable, is realistic, and not more onerous than what the
Court would have imposed upon the young person.
(v)Group Scheduling
Manage your company's schedules online with eStudio's calendar software. With our
online calendar, you and your employees will keep track of tasks, appointments, events,
and meetings with ease and efficiency. Users will receive notifications and reminders via
email. Managers can create supervisor reports to easily check staff availability. No matter
how big your team or how daunting your schedule, eStudio 6 is the group calendar
software that can manage everyone's schedule.
a. data warehouse
b. data mining
c. classification
d. clustering
2. Data Warehouse is a
ASSIGNMENT QUESTIONS