Académique Documents
Professionnel Documents
Culture Documents
CURRENT ISSUES
10
Rules - Knowledge Bases - Active And Deductive Databases - Parallel
Databases Multimedia Databases Image Databases Text Database
.
5.1 RULES
In 1985, database pioneer Dr. E.F. Codd laid out twelve rules of relational
database design. These rules provide the theoretical (although sometimes not
practical) underpinnings for modern database design. The rules may be
summarized as follows:
All database management must take place using the relational
database's innate functionality
All information in the database must be stored as values in a table
All database information must be accessible through the combination
of a table name, primary key and column name.
The database must use NULL values to indicate missing or unknown
information
The database schema must be described using the relational database
syntax
The database may support multiple languages, but it must support at
least one language that provides full database functionality (e.g. SQL)
The system must be able to update all updatable views
The database must provide single-operation insert, update and delete
functionality
Changes to the physical structure of the database must be transparent
to applications and users.
Changes to the logical structure of the database must be transparent to
applications and users.
The database must natively support integrity constraints.
Changes to the distribution of the database (centralized vs. distributed)
must be transparent to applications and users.
Any languages supported by the database must not be able to subvert
integrity controls
Explanation
subsystem
User
Case specific
database
User
interface
Inference
engine
Knowledge
base
Knowledge
engineer
Developer's
interface
Knowledge
acquisition
subsystem
Main components
knowledge-base (KB)
knowledge about the field of interest (in natural language-like
formalism)
symbolically described system-specification
KNOWLEDGE-REPRESENTATION METHOD!
inference engine
engine of problem solving (general problem solving
knowledge)
supporting the operation of the other components
PROBLEM SOLVING METHOD!
case-specific database
auxiliary component
specific information (information from outside, initial data of
the concrete problem)
information obtained during reasoning
explanation subsystem
explanation of system actions in case of user request
typical explanation facilities:
explanation during problem solving:
WHY... (explanative reasoning, intelligent help, tracing
information about the actual reasoning steps)
WHAT IF... (hypothetical reasoning, conditional
assignment and its consequences, can be withdrawn)
WHAT IS ... (gleaning in knowledge-base and casespecific database)
explanation after problem solving:
HOW ... (explanative reasoning, information about the
way the result has been found)
WHY NOT ... (explanative reasoning, finding counterexamples)
WHAT IS ... (gleaning in knowledge-base and casespecific database)
knowledge acquisition subsystem
main tasks:
Active Databases
Database system augmented with rule handling
o Active approach to managing integrity constraints
o ECA rules: event, condition, action
Many other uses have been found for active rules
o Maintaining materialized views
o Managing derived data
o Coordinating distributed data management
o Providing transaction models
o Etc.
Provably correct universal solutions lacking
o Specifying rules
o Rules
analysis
(termination,
confluence,
determinism)
observable
number
Example
wheel
1
spoke
tire
1
1
rim
frame
1
1
seat
pedal
part
subpart
3
tube
Assembly instance
body of rule
o Attribute-value skew.
Some values appear in the partitioning attributes of many
tuples; all the tuples with the same value for the
partitioning attribute end up in the same partition.
Can occur with range-partitioning and hash-partitioning.
o Partition skew.
With range-partitioning, badly chosen partition vector
may assign too many tuples to some partitions and too
few to others.
Less likely with hash-partitioning if a good hash-function
is chosen.
Handling Skew in Range-Partitioning
To create a balanced partitioning vector (assuming partitioning
attribute forms a key of the relation):
o Sort the relation on the partitioning attribute.
o Construct the partition vector by scanning the relation in sorted
order as follows.
After every 1/nth of the relation has been read, the value
of the partitioning attribute of the next tuple is added to
the partition vector.
o n denotes the number of partitions to be constructed.
o Duplicate entries or imbalances can result if duplicates are
present in partitioning attributes.
Alternative technique based on histograms used in practice
Handling Skew using Histograms
Balanced partitioning vector can be constructed from histogram in a
relatively straightforward fashion
o Assume uniform distribution within each range of the
histogram
Histogram can be constructed by scanning relation, or sampling
(blocks containing) tuples of the relation
Easiest form of parallelism to support, particularly in a sharedmemory parallel database, because even sequential database systems
support concurrent processing.
More complicated to implement on shared-disk or shared-nothing
architectures
o Locking and logging must be coordinated by passing messages
between processors.
o Data in a local buffer may have been updated at another
processor.
l Cache-coherency has to be maintained reads and writes of
data in buffer must find latest version of data.
Cache Coherency Protocol
Example of a cache coherency protocol for shared disk systems:
o Before reading/writing to a page, the page must be locked in
shared/exclusive mode.
o On locking a page, the page must be read from disk
o Before unlocking a page, the page must be written to disk if it
was modified.
More complex protocols with fewer disk reads/writes exist.
Cache coherency protocols for shared-nothing systems are similar.
Each database page is assigned a home processor. Requests to fetch
the page or write it to disk are sent to the home processor.
Intraquery Parallelism
Execution of a single query in parallel on multiple processors/disks;
important for speeding up long-running queries.
Two complementary forms of intraquery parallelism :
o Intraoperation Parallelism parallelize the execution of each
individual operation in the query.
o Interoperation Parallelism execute the different operations
in a query expression in parallel.
the first form scales better with increasing parallelism because
the number of tuples processed by each operation is typically more than the
number of operations in a query
Design of Parallel Systems
Some issues in the design of parallel systems:
Archives - usually contain large volumes of structured or semistructured homogeneous data pertaining to specific topics.
Domain-Specific Collection - this is a homogeneous collection
providing access to controlled users with very specific objectives.
Examples of such a collection are biomedical and satellite image
databases.
Enterprise Collection - a heterogeneous collection of images that is
accessible to users within an organizations intranet. Pictures may be
stored in many different locations.
Personal Collection - usually consists of a largely homogeneous
collection and is generally small in size, accessible primarily to its
owner, and usually stored on a local storage media.
Web - World Wide Web images are accessible to everyone with an
Internet connection. These image collections are semi-structured, nonhomogeneous and massive in volume, and are usually stored in large
disk arrays.
The Add Image Database page displays. Type desired database title in Title: field and
click
the
Add
button.
The new image database displays in the Available databases. Select the link to the new
image
database
you
just
created.
The
Image
Database
Screen
displays.
Select
the
Add
Image
button
The Add Image screen displays. Type in relevant keywords in the *Keywords field. Type
the owner of the image in the Creator: field. Type the path and filename in the
*Filename: field or click the browse button and find the file in the My-Files area. Type a
relevant title for this image in the Title: field. Type in the image description in the
Description: field. Type the path and filename of the image thumbnail in the
Thumbnail: field or click the browse button and find the file in the My-Files area. Select
the Add button.
Note: Creator, Title, Description and Thumbnail are not required fields and do not
require
an
entry.
Note: If you use a .gif or .jpg the database will automatically create a thumbnail when
you select add.
The Image Database page displays with the new image and information.
The Edit Record page displays. To change the *Filename: field select the New Image
button. The New Image Screen displays. Type the path and filename in the field or click
the browse button and find the file in the My-Files area. Select the Regenerate thumbnail
checkbox if you would like the image database to create a new thumbnail for you. Select
the Update button.
The Edit Record page displays again with the new image filename in the *Filename:
field. If you did not have the image database regenerate the thumbnail for you on the
previous screen, select the New thumbnail button. The New Thumbnail page displays.
Type the path and filename of the image thumbnail in the Thumbnail: field or click the
browse button and find the file in the My-Files area. Select the Update button. The Edit
Record page displays again with the new image filename in the Thumbnail: field. Type
the corrected information in *Keywords field, Creator: field, Title: field, and/or
Description:
field.
Select
the
Update
button.
The Image Database page displays with the new image and/or information.
back to the top
Caution: You will not be able to "undo" this process. The image and all the
associated data in it will be lost forever if it is deleted. If you are unsure, make a backup
of the course before removing the database. See Restoring and Resetting a WebCT course
into CE 4.1 for assistance with making a backup of your course.
From the Homepage or the Course Menu select the Image Database link.
The Available Database page displays. Select the link to the image database that contains
the image you want to edit. The Image Database page displays. Select the radio button
beside the image you would like to delete and select the Delete button.
The
Delete
Image
confirmation
window
displays.
Select
OK
button.
The Image Database page displays without the deleted image. To delete additional images
from the database repeat the above steps.
Problem - Motivation
Given a database of documents, find documents containing data,
retrieval
Applications:
Web
law + patent offices
digital libraries
information filtering
Types of queries:
boolean (data AND retrieval AND NOT ...)
additional features (data ADJACENT retrieval)
keyword queries (data, retrieval)
How to search a large collection of documents?
Full-text scanning
for single term:
(naive: O(N*M))
ABRACADABRA
text
CAB
pattern
ABRACADABRA
text
CAB
pattern