Vous êtes sur la page 1sur 57

Object-relational database systems

Yong Yao
CS632
April 17, 2001
Content
Introduction
Michael Stonebraker: Inclusion of New Types in
Relational Data Base Systems.
Michael Stonebraker, Greg Kemnitz: The Postgres
Next Generation Database Management System.
PREDATOR System Design Document: A detailed
description of internal design decisions and
implementation details.
Introduction
The relational model
Dominant mainstream approach
Work well on business data processing
But things have changed drastically,
nontraditional problems.
Taxonomy of DBMS Application

Yes

No
Use SQL
data Simple Complex
complexity
Simple data

Yes Relational
DBMS

No File
System
Use SQL
data Simple Complex
complexity
OODBMS

Yes Relational A programming


DBMS
language with a
type system
No File System OODBMS Persistent object
Use SQL
data Simple Complex
complexity
ORDBMS

Yes Relational ORDBMS


DBMS

No File System OODBMS

Use SQL
data Simple Complex
complexity
Features
Base type extension
Inheritance
Complex object
Inclusion of New Types in Relational
Database Systems

Michael Stonebraker
EECS Dept.
University of California, Berkeley
Example
Represent a set of boxes in the DBMS and a simple query- find
all the boxes that overlap the unit square(0,1,0,1).

RDBMS:
Create box(id=i4, x1=f8, x2=f8,y1=f8,y2=f8)

select *
from box
where ! (box.x2<=0 or box.x1>=1 or box,y2<=0 or
box.y1>=1)

Too hard to understand


Too complex to optimize
Too many clauses to check
Example (contd)
Create box(id=i4,desc=box)

select *
from box
where box.desc!!0,1,0,1

Box: new data type


!!: overlap operator with two operands of data type box,
return a boolean
Operators for Boxes
Motivation
Needs of business data processing
applications
Geographic Information System
New Types: Point, Line, Polygon;

New operators: distance, intersection ;

New access methods: R-trees, KDB trees;


Conditions
1. The definition of user-defined data types
2. The definition of new operators for these
data types
3. The implementation of new access methods
for data types
4. Optimized query processing for commands
containing new data types and operators
Definition of New Types
Follow a registration process
define type-name length = value,
input = file-name,
output = file-name
Occupy a fixed amount of space
Conversion routines
Definition of New Operators
Safety loophole
Problem:
an ADT routine which has an error can overwrite
DBMS data structures
Unclear whether such errors are due to bugs in
the user routines or in the DBMS
Solutions:
Run in separate address space
Build a language processor
Provide two environments
New access methods
Users can add new access methods to
support user-defined data types
Goal: extensibility
Interface:
Conditions for the operators
Information on the data types of operators
Define set of operators
Example: B-tree
Conditions for operators:
Information on the data types of operators (B-tree)

<= is required
Type: specific type, fixed, variable, fix-var, type1, type2
New set of operators (B-tree)

F1=(value - low-key) / (high-key low-key)


F2=(high-key value) / (high-key low-key)
Implementing New Access Methods
A collection of procedure calls
open (relation-name)
close (descriptor)
get-next(descriptor, OPR, value, tuple-id)
insert (descriptor,tuple)
delete(descriptor, tuple-id)
replace(descriptor, tuple-id, new-tuple)

build(descriptor,keyname,OPR)
Hard problem- work with transaction management
log pages simple, but may suffer from
performance penalty
log events
Event-oriented interface
REDO(T)
UNDO(T)
LOG(event-type, event-data)
Access path selection
Four pieces of information for optimization
Stups: estimate the expected number of records
where rel-name.field-name OPR value
A second selectivity factor S:
where relname-1.field-1 OPR relname-2.field-2
Feasibility of merge-sort
Feasibility of hash-join

Define operator token= AE,



Stups=1
S= min(N1.N2).
merge-sort with AL,
hash-join
Generate query processing plan
relname-1.field-1 OPR relname-2.field-2
Merge sort
Iterative substitution
Hash Join
Relname.field-name OPR value
any access method with field-name as a key
Secondary index
Sequential search
Summary
How to extend an abstract data type
How to define new access method
Integrate new access method with
transaction management
Generation of optimized query processing
plan
The POSTGRES Next-Generation
Database Management System

Michael Stonebraker
Greg Kemnitz
Three kinds of services for DBMS
Traditional data management
Simple data type
Object management
Complex data types, e.g. bitmaps, icons, text
Knowledge management
Store and enforce a collection of rules
Example- newspaper layout
store and manipulate text and graphics, Bill
customers for advertisement.

Data: customer information


Object: text, pictures and icons
Rules: control newspaper layout
POSTGRES Data Model-Design criteria
Traditional relational DBMSs data model: a
collection of named relations
Orientation toward database access from a
query language
Interact with database by query language-
POSTQUEL
User defined functions
Design criteria
Orientation toward multilingual access
Two selections
One language tightly coupled to the system
Multilingual
POSTGRES is programming language
neutral
Design criteria
Small number of concepts
As few as possible
Four constructs:
Class
Inheritance
Type
Function
Data Model - Class
Class(constructed type or relation) : a named
collection of instances of objects.
Instances(record or tuple) : has the same
collection of attributes

create EMP(name=string, salary= float, age=int)


Data model - Inheritance
Inherit data elements from other classes
create EMP(name=string, salary= float, age=int)
create SALESMAN(quota=float) inherits EMP

Multiple inheritance only create


EMP
objects without ambiguity

SALESMAN
Three kinds of classes
Real class: instances are stored in the database
Derived class :view
Version: store differential relative to its base class
Initially has all instances of the base class
Freely updated to diverge from the base class
Updates to the version do not affect the base class
Updates to the base class are reflected in the version
Supported by the rule system
Version Example
create version my-EMP from EMP

Two classes generated:


EMP-MINUS(deleted-OID)
EMP-PLUS(all-fields-in EMP, replaced-OID)

Retrieve: retrieve EMP-PLUS instead


Insert: All new instances for EMP or my-EMP will be
added into EMP-PLUS;
Delete: move records from EMP-PLUS to EMP-MINUS
Data model - Types
Base types:
Int, float, string
construct new base types.
Arrays of base types:
Composite types:
Construct complex objects: attributes contain
other instances as part or all of their value
Composite types
Contains zero or more instances of the same
class
create EMP ( , manager=EMP, coworkers=EMP)

Set: value is a collection of instances from all


classes
create EMP( , hobbies=set)
{softball, skiing, skating}
Data Model - Functions
C functions
Arbitrary C procedures
Can not be optimized by the POSTGRES
Argument: base types or composite types
Inherited down the class hierarchy

overpaid(EMP)
overpaid(SALESMAN)
Functions
Operators
Utilize index
Functions with one or two operands
{ALT, ALE, AE, AGT, AGE}
Allow new access methods

POSTQUEL functions
Any collection of commands in the POSTQUEL
Define function high-pay returns EMP as
Retrieve (EMP.all)
Where EMP.salary>50000
POSTQUEL
Set-oriented query language
Support nested queries
Transitive closure
Support for inheritance
Support for time travel
Allow a user to run historical query
retrieve (EMP.salary)
from EMP [T]
where EMP.name=Sam
Maintain two different physical collections of records
Vacuum cleaner: a daemon moves records
The Rules System
Requirements:
referential integrity
View management
Triggers
Integrity constraints
Protection
Version control
Design a general purpose rules system
Syntax of rules
ON event (TO) object WHERE on new EMP.salary where
POSTQUEL-qualification EMP.name=Fred
THEN DO [instead] then do replace
POSTQUEL-command(s) E (salary=new.salary)
from E in EMP
where E.name=Joe

Event: retrieve, replace, delete, new


Object: name of a class or class.column
POSTQUEL-qualification: normal qualification
POSTQUEL-commands: a set of POSTQUEL commands
Forward and Backward chaining
Rules specify additional actions, and these actions
may activate other rules

on new EMP.salary where on retrieve to EMP.salary where


EMP.name=Fred EMP.name=Joe
then do replace then do instead retrieve
E(salary=new.salary) (EMP.salary)
from E in EMP where EMP.name=Fred
where E.name=Joe
Implementation of Rules
Record level processing
Rules system is called when individual records are
accessed
Place a marker on the record
Query rewrite module
Perform poorly for a large number of small-scope
rules
Desirable when there are a small number of
larger-scope rules
When rules are activated
Immediate-same transaction
Immediate-different transaction
Deferred-same transaction
Deferred-different transaction

Currently only implements the first option


Storage System
no-overwrite storage manager
Old record remains in the database whenever an
update occurs
Instantaneous crash recovery
Support time travel
stable main memory required
Performance
Better than UCB-INGRES
But
Compare to the Cattell system, loses by about a
factor of two.
Comments
POSTGRES allows an application designer to
trade off performance for data independence

Imports only specific user functions into its


address space
PREDATOR
DESIGN AND INPLEMENTATION

PARVEEN SESHADRI
System overview
A client-server database system
Goal: Extensibility - adding the ability to
process new kinds of data
Basic components
Extensible table of Enhanced Abstract Data Type
(E-ADTs)
Extensible table of query processing
engines(QPEs)
Server Architecture
TEXT CLIENT

Ts RE
E -AD L AT
R
HE IO
OT N
EN
G IN
D S SC E
OR HE
C M
RE AS
T

DISK STORAGE
D

RECOVERY
E-A
GE

SHORE LIBRARY

THREAD SUPPORT
SOCKET SUPPORT

IMA

BUFFER XACTS
MGMT
BA

ES
SIC

IN
CONCURRENCY S
V

NG
ON
AD

AL

SI

RE
S
U
T

E
ES

PR
s

HE
EX

OT
REL
ATI
O NE
-AD
T

WEB CLIENT
UTILITY LAYER
Enhanced Data Types
Standard ADT
Specify the storage format for values
Specify methods that can be invoked on values
Specify how some methods can be matched using
indexes
Motivation: methods can be very expensive
Data.image.sharpen().clip()
Improvement
Data.image.sharpen().clip()
It is unnecessary for Sharpen to compress and write

its result to disk


-> Pass result directly in memory to Clip

Rewrite the expression


-> Data.image.clip().sharpen()

No need to retrieve the entire image


E-ADT
Expose the semantics of its methods
Query optimizations are performed using
these semantics

Find more at :
The Case for Enhanced Abstract Data Types.
VLDB 1997
Thank You

Vous aimerez peut-être aussi