Semantic Web Content Analysis - Χρήστος Ζιγκόλης

Semantic Web Content
Analysis
A Study in Proximity-Based
Collaborative Clustering
Contents…
(1) Semantic Web
Proximity Based Collaborative

(2) Clustering
(3) Experimental Studies
Semantic Web in
(4) Web Intelligence
Semantic Web
Semantic Web – Definition
“is an evolving extension of the World Wide Web in which
the semantics of information and services on the web is
defined, making it possible for the web to understand and
satisfy the requests of people and machines to use the
web content.”
The ultimate goal is to create a

global mean for information
exchange where the data will be
available for processing from
humans and machines
Semantic Web – Architecture
URI = string that characterize a
web resource
XML = a user-defined syntax
for web resources
[No semantic issues]
RDF = represent information
and relations between web
resources
OWL = extend and enhance the RDF with more features
Logic / Proof = semantic relations between data from lower levels
under rules AND conclusions will be extracted from these rules
Trust = information reliability tests / digital signatures etc
Semantic Web – RDF document
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:cd="http://www.recshop.fake/cd#"> RDF file
<rdf:Description
rdf:about="http://www.recshop.fake/cd/Empire Burlesque">
<cd:artist>Bob Dylan</cd:artist>
<cd:country>USA</cd:country>
<cd:company>Columbia</cd:company>
<cd:price>10.90</cd:price>
<cd:year>1985</cd:year>
</rdf:Description>
….
</rdf:RDF>
Semantic Web – Triple Form
Subject – Predicate – Object

Semantic Web – Graph Representation
Proximity Based
Collaborative Clustering
Proximity Based Collaborative Clustering
Collaborative Clustering …in keywords
• Several Data sets (same objects – different features)
• Process separately each data set
• Collaboration at the level of the results (information
granules)
----------------------------------------------------------------------------------------
Proximity measure
This mechanism allows us to use different number of clusters
in the processing of each data set
C
Prox i , j [ii ] = ∑ min(uki , ukj ) Æ matrix NxN
k =1
Proximity Based Collaborative Clustering(2)
The Algorithm Process 1) Compute
X = [X1|X2|…|Xp]
U = fcm(X, max(C1, C2,…,Cp))
DATA SETS Prox(U)

2) For each { X[ii], C[ii] }
[ {X1, C1},{X2, C2}, U[ii] = fcm( X[ii], C[ii] )
…,{Xp, Cp} ] Prox(U[ii])

3) Repeat
Optimization of index V Æ min
Proximity Based Collaborative Clustering(3)
V = Prox(U) − Prox(U[1]) + Prox(U) − Prox(U[2]) +
... + Prox(U) − Prox(U[p])

We require that Prox(U) is made as close as possible to the
matrices Prox(U[1]), Prox(U[2]),…Prox(U[p])
The optimization of V is carried out using a standard gradient-
based mechanism
α ∂V
uij (iteration + 1) = uij (iteration) −
N ∂ui , j
Experimental Studies
Data Formulation
70 SWDs (RDF syntax)

Grouping according to the main topic
• 1-16 : docs with phone devices’ information
• 17-34 : personal homepages
• 35-51 : people and information about their workplace
• 52-70 : semantic web area
Data Formulation (cont…)
Two Feature Spaces

Semantic : a parser extracts the most relevant metadata, 12
in number
Content-Based : a parser elicits the most meaningful words
which represent the value assumed by metadata, and which
are surrounded by the meta-tags. 10 in number
Experimental Studies(2)
Data Formulation (cont…)
We have to express these 2 feature spaces to 2 data matrices
so we are able to start the clustering.
“Semantic” Data Matrix :
Rows = 70 SWDs and Columns = 12 semantic features
“Content-Based” Data Matrix :
Rows = 70 SWDs and Columns = 10 content-based features
----------------------------------------------------------------------------------------
Each entry of these matrices represents the number of
occurrences of the corresponding feature in the current
document
Experimental Studies(3)
Metadata-Features Content-based Features
|
(1)airport:Airport | (1)2004-XX-XX
(2)contact:nearestAirport (2)semantic web
(3)foaf:Person | (3)web
(4)foaf:knows | (4)network
(5)dc:title (5)internet
(6)foaf:Document | (6)paper/document
(7)prf:NetworkCharacteristic (7)project
|
(8)prf:HardwarePlatform (8)UTF-8
(9)foaf:homepage | (9)technology
(10)foaf:interest (10)information
(11)prf:CcppAccept |
(12)foaf:Project
|
particular cluster
distribution of the membership grades of

certain docs
Comparison Issues
A unique data set with the 70 SWDs and the
22 features (semantic and content-based)
X[70x22] Æ Standard FCM with C = 4
Two different data sets and

Proximity Based Collaborative Clustering
X[70x12] and Y[70x10] and a global structure U
Comparison Issues
“Proximity-Based VS Standard FCM”
1. Distribution of documents in Cluster 1 and 2 are similar
2. In prototype of cluster 2 the representative features (project,
information) has higher values in Proximity-Based than FCM
1.40 > 0.54 n’ 2.35 > 0.79
3. FCM weakness Æ unable to discriminate the remaining documents in
Cluster 3 and 4. Membership values close to each other
- docs in range [35-60] Æ similar membership distribution for each
component (Cluster 3, [< 0.44]) and docs in range [48-70] Æ same
effect (Cluster 4, [max = 0.44])
- Proximity Based Æ (Cluster 3, [35-60], [ > 0.44, max = 0.72] and
(Cluster 4, [48-70], [ > 0.44])
Comparison Issues(2)
“Proximity-Based VS Standard FCM”
4. Documents in range [61-70]
The contribution of metadata clustering

Proximity-Based Æ Cluster 2
FCM Æ NOT appear in Cluster 2
Proximity-Based collaborative clustering better reflects the partitioning
realized in the individual clustering.
Prototypes, Proximity-Based and FCM
(2)
Proximity-Based Prototypes
FCM Prototypes (2)

(1) (4)
(3)
S
T
A
N
D
A
R
D
values < 0.44
F
C
M
P (1)
R (4) (3)
O
X
I
M
I
T
Y
B
A values > 0.44
S
E
D
Semantic Web in
Web Intelligence
Semantic Web in Web Intelligence
Data : “refers to a collection of natural phenomena
descriptors, including the results of experience, observation
or experiment, or a set of premises.”
Information : “is the interpretation of the results came from
data processing”
Web Until Now
----------------------------------------------------------------------------------------
Knowledge
“well, there are more than one definitions”
“We have to extract the hidden knowledge from web and build
an extension. Make the “new” web understandable not only for
humans but also for machines”
Semantic Web
Semantic Web in Web Intelligence(2)
• Web Intelligence : “exploits Artificial Intelligence (AI) and
advanced Information Technology (IT) on the Web and
Internet”
• Semantic Web needs standards for both syntactic and

semantic content Æ Ontology is a solution
• Ontologies will enable Web-based knowledge processing,

sharing, and reuse between applications. Also they’ll play a
major role in supporting information exchange processes.
Semantic Web in Web Intelligence(3)
The roles of ontologies for Web intelligence :
• communication between Web communities
• agents communication based on semantics
• knowledge-based Web retrieval
• understanding Web contents in a semantic way
• web community discovery (implicitly-defined community)

Conclusions
There are many types of algorithms that belong to the field of
Computing Intelligence which have been used for problem
solving.
The web is expanding with great speed, while searching for

new information organization techniques and knowledge
extraction from these. So, why should we not advance to
applications of classical algorithms from the field of Computing
Intelligence in order to solve some of the existing problems?
Welcome to the world of Web Intelligence.

Thank you!

Semantic Web Content Analysis - Χρήστος Ζιγκόλης

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Semantic Web Content Analysis - Χρήστος Ζιγκόλης

Transféré par

Droits d'auteur :

Formats disponibles

Semantic Web Content

Proximity Based Collaborative

(3) Experimental Studies

The ultimate goal is to create a

Subject – Predicate – Object

DATA SETS Prox(U)

…,{Xp, Cp} ] Prox(U[ii])

V = Prox(U) − Prox(U[1]) + Prox(U) − Prox(U[2]) +

... + Prox(U) − Prox(U[p])

70 SWDs (RDF syntax)

Two Feature Spaces

distribution of the membership grades of

X[70x22] Æ Standard FCM with C = 4

Two different data sets and

The contribution of metadata clustering

FCM Prototypes (2)

• Semantic Web needs standards for both syntactic and

• Ontologies will enable Web-based knowledge processing,

• communication between Web communities

• agents communication based on semantics

• knowledge-based Web retrieval

• understanding Web contents in a semantic way

• web community discovery (implicitly-defined community)

The web is expanding with great speed, while searching for

Welcome to the world of Web Intelligence.

Vous aimerez peut-être aussi