Académique Documents
Professionnel Documents
Culture Documents
Outline
Matching problem
Classification
Basic techniques
Matching process
Systems
Conclusions
Outline
Matching problem
Classification
Basic techniques
Matching process
Systems
Conclusions
Matching operation
Electronics Electronics
Personal Computers PC
Microprocessors PC board
PID ID
Name Brand
Quantity Amount
Price Price
Accessories Cameras and Photo
Photo and Cameras Accessories
PID Digital Cameras
Name
Quantity ID
Brand
Price
Amount
Price
Tutorial on Ontology Matching at SWAP-2006, Pisa, Italy 6 / 55
Matching problem Classification Basic techniques Matching process Systems Conclusions
Electronics Electronics
Personal Computers PC
⊥
Microprocessors PC board
PID ID
Name Brand
Quantity Amount
Price Price
Accessories Cameras and Photo
Photo and Cameras ≥ Accessories
PID Digital Cameras
Name ≥
Quantity ID
Brand
Price
Amount
Price
Tutorial on Ontology Matching at SWAP-2006, Pisa, Italy 6 / 55
Matching problem Classification Basic techniques Matching process Systems Conclusions
Product Monograph
Essay
Litterary critics
DVD
Politics
Book
Biography
CD
Autobiography
Literature
Product Monograph
price isbn
title author
doi title
creator Essay
topic
Litterary critics
DVD
Politics
Book
Biography
author
subject
CD
Autobiography
Literature
≥
Product integer Monograph
price string isbn
title ≥ author
doi uri title
creator Essay
topic
≥
Person ≤ Litterary critics
DVD
Human Politics
Book
Biography
author ≥ Writer
subject
CD
Bertrand Russell: My life Autobiography
Albert Camus: La chute Literature
≥
Product Monograph
price isbn
title ≥ author
doi title
creator Essay
topic
≥
Person ≤ Litterary critics
DVD
Human Politics
Book
Biography
author ≥ Writer
subject
CD
Bertrand Russell: My life Autobiography
Albert Camus: La chute Literature
Scope
Scope
Scope
Scope
Scope
Scope
Scope
Correspondence
Definition (Correspondence)
Given two ontologies O and O , a correspondence M between O and O is
a 5-uple: id, e, e , R, n such that:
id is a unique identifier of the correspondence
Correspondence
Definition (Correspondence)
Given two ontologies O and O , a correspondence M between O and O is
a 5-uple: id, e, e , R, n such that:
id is a unique identifier of the correspondence
e and e are entities of O and O (e.g., XML elements, classes)
Correspondence
Definition (Correspondence)
Given two ontologies O and O , a correspondence M between O and O is
a 5-uple: id, e, e , R, n such that:
id is a unique identifier of the correspondence
e and e are entities of O and O (e.g., XML elements, classes)
R is a relation (e.g., equivalence (=), more general (), disjointness
(⊥))
Correspondence
Definition (Correspondence)
Given two ontologies O and O , a correspondence M between O and O is
a 5-uple: id, e, e , R, n such that:
id is a unique identifier of the correspondence
e and e are entities of O and O (e.g., XML elements, classes)
R is a relation (e.g., equivalence (=), more general (), disjointness
(⊥))
n is a confidence measure in some mathematical structure (typically in
the [0,1] range)
Alignment
Definition (Alignment)
Given two ontologies O and O , an alignment (A) between O and O :
is a set of correspondences on O and O
Alignment
Definition (Alignment)
Given two ontologies O and O , an alignment (A) between O and O :
is a set of correspondences on O and O
with some cardinality: 1-1, 1-*, etc.
Alignment
Definition (Alignment)
Given two ontologies O and O , an alignment (A) between O and O :
is a set of correspondences on O and O
with some cardinality: 1-1, 1-*, etc.
some additional metadata (method, date, properties, etc.)
Matching process
O
Matching process
matching
O
Matching process
matching A
O
Matching process
A matching A
O
Matching process
O parameters
A matching A
O
Matching process
O parameters
A matching A
O resources
Application domains
Traditional
Ontology evolution
Schema integration
Catalog integration
Data integration
Application domains
Traditional
Ontology evolution
Schema integration
Catalog integration
Data integration
Emergent
P2P information sharing
Agent communication
Web service composition
Query answering on the web
O O
DB DBPortal
O Matcher O
DB DBPortal
O Matcher O
DB DBPortal
O Matcher O
Generator
DB Transformation DBPortal
O Matcher O
Generator
DB Transformation DBPortal
O O
peer1 peer2
O Matcher O
peer1 peer2
O Matcher O
peer1 peer2
O Matcher O
Generator
O Matcher O
Generator
query query
peer1 mediator peer2
O Matcher O
Generator
query query
peer1 mediator peer2
answer answer
Applications: summary
automatic
operation
instances
complete
run time
correct
Application
√ √ √
Ontology evolution transformation
√ √ √
Schema integration merging
√ √ √
Catalog integration data translation
√ √ √
Data integration query answering
√
P2P information sharing query answering
√ √ √
Web service composition data mediation
√ √ √ √
Multi agent communication data translation
√ √
Query answering query reformulation
Outline
Matching problem
Classification
Basic techniques
Matching process
Systems
Conclusions
Matching dimensions
Input dimensions
Underlying models (e.g., XML, OWL)
Schema-level vs. Instance-level
Matching dimensions
Input dimensions
Underlying models (e.g., XML, OWL)
Schema-level vs. Instance-level
Process dimensions
Approximate vs. Exact
Interpretation of the input
Matching dimensions
Input dimensions
Underlying models (e.g., XML, OWL)
Schema-level vs. Instance-level
Process dimensions
Approximate vs. Exact
Interpretation of the input
Output dimensions
Cardinality (e.g., 1-1, 1-*)
Equivalence vs. Diverse relations (e.g., subsumption)
Graded vs. Absolute confidence
Matching dimensions
Input dimensions
Underlying models (e.g., XML, OWL)
Schema-level vs. Instance-level
Process dimensions
Approximate vs. Exact
Interpretation of the input
Output dimensions
Cardinality (e.g., 1-1, 1-*)
Equivalence vs. Diverse relations (e.g., subsumption)
Graded vs. Absolute confidence
Three layers
Three layers
Three layers
Element-level Structure-level
Element-level Structure-level
Constraint- Upper,
String- Graph-
based domain Repository Model-
based Language- Linguistic based Taxonomy-
specific of based
based resources type graph ho- based
name, formal structures SAT solvers,
tokenization, lexicons, similarity, momorphism taxonomy
description ontologies structure DL
elimination thesauri key children, structure
similarity DOLCE, metadata reasoners
properties leaves
FMA
Element-level Structure-level
Constraint- Upper,
String- Graph-
based domain Repository Model-
based Language- Linguistic based Taxonomy-
specific of based
based resources type graph ho- based
name, formal structures SAT solvers,
tokenization, lexicons, similarity, momorphism taxonomy
description ontologies structure DL
elimination thesauri key children, structure
similarity DOLCE, metadata reasoners
properties leaves
FMA
Element-level Structure-level
Constraint- Upper,
String- Graph-
based domain Repository Model-
based Language- Linguistic based Taxonomy-
specific of based
based resources type graph ho- based
name, formal structures SAT solvers,
tokenization, lexicons, similarity, momorphism taxonomy
description ontologies structure DL
elimination thesauri key children, structure
similarity DOLCE, metadata reasoners
properties leaves
FMA
Outline
Matching problem
Classification
Basic techniques
Matching process
Systems
Conclusions
Prefix
takes as input two strings and checks whether the first string starts with
the second one
net = network; but also hot = hotel
Prefix
takes as input two strings and checks whether the first string starts with
the second one
net = network; but also hot = hotel
Suffix
takes as input two strings and checks whether the first string ends with
the second one
ID = PID; but also word = sword
Edit distance
takes as input two strings and calculates the number of edition
operations, (e.g., insertions, deletions, substitutions) of characters
required to transform one string into another, normalized by length of
the maximum string
EditDistance(NKN,Nikon) = 0.4
Tokenization
parses names into tokens by recognizing punctuation, cases
Hands-Free Kits → hands, free, kits
Tokenization
parses names into tokens by recognizing punctuation, cases
Hands-Free Kits → hands, free, kits
Lemmatization
analyses morphologically tokens in order to find all their possible basic
forms
Kits → Kit
Elimination
discards “empty” tokens that are articles, prepositions, conjunctions . . .
a, the, by, type of, their, from
Sense-based: WordNet
A
B if A is a hyponym or meronym of B
Brand Name
A B if A is a hypernym or holonym of B
Europe Greece
A = B if they are synonyms
Quantity = Amount
A ⊥ B if they are antonyms or the siblings in the part of hierarchy
Microprocessors ⊥ PC Board
Children
Two non-leaf schema elements are structurally similar if their immediate
children sets are highly similar
Children
Two non-leaf schema elements are structurally similar if their immediate
children sets are highly similar
Leaves
Two non-leaf schema elements are structurally similar if their leaf sets
are highly similar, even if their immediate children are not
(e.g., Cupid, COMA)
Electronics
Electronics PC
Personal computers Cameras and photos
Photos and cameras Digital cameras
PID ID
Name Brand
Quantity Amount
Price Price
Electronics
Electronics PC
Personal computers Cameras and photos
Photos and cameras Digital cameras
PID ID
Name Brand
Quantity Amount
Price Price
Electronics
Electronics PC
Personal computers Cameras and photos
Photos and cameras Digital cameras
PID ID
Name Brand
Quantity Amount
Price Price
Electronics Electronics
Personal Computers PC
Microprocessors PC board
PID ID
Electronics Electronics
Personal Computers PC
Microprocessors PC board
PID ID
Axioms
(Electronics1 ↔ Electronics2 ) ∧ (Personal Computers1 ↔ PC2 )→
context context
1 2
(Electronics1 ∧ Personal Computers1 ) ↔ (Electronics2 ∧ PC2 )
=
≥
micro-company = company SME = firm
≤5 employee
≤10 associate
=
≥
micro-company = company SME = firm
≤5 employee
≤10 associate
=
≥
micro-company = company SME = firm
≤5 employee
≤10 associate
≤
micro-company
SME
Outline
Matching problem
Classification
Basic techniques
Matching process
Systems
Conclusions
Sequential composition
parameters parameters
resources resources
O
Parallel composition
parameters
O matching A
resources
A aggregation A
parameters
O matching A
resources
Ranking strategies
Thresholds
MaxDelta
Ranking strategies
Thresholds
MaxDelta
Cardinalities
1-1, 1-*, *-*
Ranking strategies
Thresholds
MaxDelta
Cardinalities
1-1, 1-*, *-*
Optimization
stable marriage
maximal weight match
Ranking strategies
Thresholds
MaxDelta
Cardinalities
1-1, 1-*, *-*
Optimization
stable marriage
maximal weight match
Directionality
O → O , O → O (SmallLarge, LargeSmall)
O → O and O → O (Both)
Outline
Matching problem
Classification
Basic techniques
Matching process
Systems
Conclusions
Cupid
Schema-based
Computes similarity coefficients in the [0 1] range
Performs linguistic and structure matching
Sequential system
Cupid architecture
A M
O
Linguistic Structure
M
matching M matching M Weighting
O thesauri
OLA
OLA architecture
O
parameters
similarity
A M compu- M A
tation
resources
O
Falcon-OA architecture
Linguistic Structure
M
matching M matching M A
O
S-Match
Schema-based
Computes equivalence (=), more general (), less general (
),
disjointness (⊥)
Analyzes the meaning (concepts, not labels) which is codified in the
elements and the structures of ontologies
Sequential system with a composition at the element level
S-Match architecture
O
Pre- Match
Translator A
processing
PTrees
manager A
O
SAT semantic
oracles
solvers matchers
Outline
Matching problem
Classification
Basic techniques
Matching process
Systems
Conclusions
Summary
Uses of classification
Challenges
Acknowledgments
...coming up soon
Thank you
for your attention and interest!
Questions?
pavel@dit.unitn.it
Jerome.Euzenat@inrialpes.fr
http://www.ontologymatching.org