Integrating Semantic Concept Similarity in Model-Based Web Applications

Integrating Semantic Concept Similarity in Model-Based Web Applications
Cristiano Rocha1, 2, Daniel Schwabe1, and Marcus Poggi1

1 Department of Informatics, Catholic University, Rio de Janeiro
{dschwabe, poggi}@inf.puc-rio.br
2 Milestone
crocha@milestone-ti.com.br
Abstract might help overcome a lack of knowledge about the

problem domain on the part of the user who is responsible
Model-based design methods, and model-based for entering new information into the system.
architectures, have gained adoption in authoring A second scenario occurs when the user entering new
applications for the WWW. This is further reinforced by information only partially knows the problem domain, and
the increasing visibility of the Semantic Web, where is therefore unable to explicitly link the new information
models are intrinsic, described as metadata associated with other existing information in the database. The
with the data made available to users and applications. system is then able to suggest likely links, which are easier
Several proposals attempt to leverage this additional for the user to recognize as being valid. This scenario
information to improve search functionalities, by occurs frequently in large organizations, where people
incorporating semantic similarity (or proximity) measures responsible for entering new information only know their
into the search mechanism. In this paper we show how the own areas and has little or no knowledge about other areas
availability of a semantic similarity evaluation engine can in the organization.
be used to enhance several functionalities of Web-based Another problem we are going to address is the
applications. In particular we will show how such an presentation order of links in an application. By
infra-structure can be used to detect and suggest new presenting links to related information in an order
relation instances, as well as propose an inferred reflecting the semantic closeness of the corresponding
ordering for the presentation of related information that information, it is expected that it will be easier for the user
reflects the semantic closeness of the corresponding to find relevant information.
information. The proposed engine is based on a hybrid Both functionalities that will be discussed in the paper
spread activation algorithm applied to the concept could be implemented using any similarity measure which
instances graph. can somehow calculate the similarity or proximity of two
concepts or pages in a web application. However, with the
1. Introduction advent of the Semantic Web and model based design
methods, we envision that similarity measures which
In this paper we propose a semantic similarity explore the semantics of the underlying model and the
framework that makes it possible to provide a numeric instances of an application can provide powerful measures
strength or similarity between two nodes, corresponding that will outperform existing techniques in a number of
to two concept instances, in an application. We envision contexts. We propose two different algorithms for
this type of measure to be very useful in various calculating similarities measures which rely on the
applications. processing of semantic information related to the instances
In web applications where the input of information of the application: weight mapping and spread activation
happens in a distributed way one of the biggest problems algorithms.
is to guarantee that this process happens in the most The similarity processing framework proposed here
natural and user friendly way, and also to guarantee that can also be successfully used for searching in model-
all the information is consistent. Here we show how the based applications. In [7] we presented a novel approach
use of a similarity measure can help this process by for combining traditional Information Retrieval techniques
suggesting new links which the system detects as having a with spread activation and weight mapping techniques, in
great likelihood of existing. The absence of this link might order to provide proximity semantic searches that
be due to an error or inconsistency in the insertion of the combine textual information with semantic information.
information. If this is the case, the relations suggestion Here, we use the same processing framework for
functionality will help the user in identifying this providing the desired functionalities; more information
inconsistency and correcting it. If it is not the case, it and details on the algorithms can be found in that paper.
Proceedings of the WebMedia & LA-Web 2004 Joint Conference 10th Brazilian Symposium on Multimedia and the Web 2nd Latin American Web Congress
(LA-Webmedia’04)
0-7695-2237-8/04 $20.00 © 2004 IEEE
2. Similarity Processing Framework resembles the tf-idf strategy [9] that is commonly used in
classic Information Retrieval since it combines a
2.1. Weight Mapping similarity with a specificity measure. In the general case
this combined measure proved to be the best one in our
This technique tries to explore the fact that ontologies applications. Other similarity and specificity measures
and their instances carry much more information than might be used in the future to achieve better results.
n
¦n
what is explicitly stated, as there is much “hidden”
information entailed by the relations (i.e., a semantically- ijk
1
based linking structure). In traditional ontologies, it is W (C j , C k ) = i =1
(1) W (C j , C k ) = ( 2)
n
only possible to indicate the presence or absence of a
¦n ij
n k
relation between two concept instances. In many i =1
situations, however, it would be desirable to also express
some strength associated with the relation. The classical
way is to associate a numerical value to the corresponding 2.2 Hybrid Spread Activation
link. One of the ideas in this work is to extract knowledge
from the ontology and its instances in order to obtain a The other strategy we use to calculate this similarity
numerical weight for each existing relation instance in the measure employs spread activation techniques. Such
model. A similar idea was presented in [8], to provide a techniques are among the most used processing
novel approach for ranking the results of ontology-based frameworks for semantic networks, having been
searching in the Semantic Web, with good results. We call successfully applied in several fields, particularly in
“Weight Mapping” the technique of calculating a Information Retrieval applications [3,4]. Given an initial
numerical weight value for each relation instance, based set of activated concepts and some restrictions, activation
on the analysis of the link structure of the knowledge base. flows through the network reaching other concepts which
Different ideas were tested in devising a calculation are closely related to the initial concepts. It is very
that can generate a strength formula for each existing powerful to perform proximity searches, where given an
relation instance in the knowledge base. In [7] we initial set of concepts the algorithm returns other concepts
proposed three different measures - cluster, specificity and which are strongly connected to them. Inferences occur
combined- which we found very useful in developing our naturally in this process, since the result set may contain
system. We are aware that the choice of these measures is nodes that are not directly linked to the initial set of
totally application and task dependent. Here we will just nodes. An overview of spread activation techniques is
briefly present the three proposed measures. For deeper presented in [4].
information on the motivations behind them and Usually spread activation techniques are used either on
explanations of the formulas the user should refer to [7]. semantic networks, where each edge in the network has
The first measure tries to establish the degree of only a label associated to it, or in association networks,
similarity between two related concept instances in a where each edge has only a numeric weight associated to
relation. The similarity measure used is very similar to the it. In [7] we showed how to use the weight mapping
cluster function used in [2], obtained by specializing that techniques to construct a hybrid instances network, where
function for concepts that relate to each other. Formula 1 each relation instance has both a semantic label and a
indicates the similarity between concept instances Cj and numerical weight, and use spread activation on this
Ck. The value nij represents that concept Cj is related to network. The intuition behind this idea is that better
concept Ci (it is 1 if the concepts are related and 0 results in the search process can be achieved using the
otherwise). The value nijk represents the fact that both semantic information together with sub-symbolic
concepts Cj and Ck are related to concept Ci (1 if both (numerically encoded) information extracted from the
concepts Cj and Ck are related to Ci and 0 otherwise). instances. Several works in the literature present spread
The second measure is similar to the idf (inverse activation algorithms either in semantic [3] or in
domain frequency) measure [9] widely used in associative nets [2]. However, there are few works that
Information Retrieval (although in I.R. the log function is use both approaches together.
normally used). It is useful when the user wants to give The algorithm has as a starting point an initial set of
the semantic of specificity or differentiation to the instances in the ontology, henceforth called nodes, which
relation. Formula (2) was used for the specificity measure. have an initial activation value; in the functionalities
The value nk is equal to the number of instances of the proposed in this paper this value will be 1.0. All nodes not
given relation type that have k as its destination node. in the initial set have their initial activations set to zero.
The third measure is the combined measure, obtained The initial nodes are put in a priority queue, ordered non-
as the product of the two previous ones. Its calculation increasingly with respect to the node’s activation value.
(LA-Webmedia’04)
0-7695-2237-8/04 $20.00 © 2004 IEEE
The node with the highest activation value is then taken 3XEOLFDWLRQ 3XEOLFDWLRQ $UHD
³:HE6HUYLFHV ³+\EULG$SSURDFK ³,QIRUPDWLRQ
out of the queue and processed. If it satisfies all the 3DWWHUQV´ IRU6HDUFKLQJ´ 5HWULHYDO´
restrictions, it propagates its activation to its neighbors.

The neighboring nodes which were activated, and are not 3XEOLFDWLRQ
³:HE6HUYLFHV
3URIHVVRU 6WXGHQW
³6FKZDEH´ ³&ULVWLDQR´
currently in the priority queue, are added to it. The 2UFKHVWUDWLRQ´
priority queue is then reordered. The node that was just

6WXGHQW $UHD $UHD
processed is added to the results list, which contains all ³)UDQFLVFR´ ³:HE6HUYLFHV´ ³+\SHUPHGLD´
$UHD
the nodes that have been processed and are the result of ³6RIWZDUH
(QJLQHHULQJ´ &RQFHSWXDO5HODWLRQ
the spread activation process.
Figure 1. Example of an instances graph in the research
This process repeats itself until a specified state is
domain with all its nodes and relations
achieved (a defined output size for example), or there are
no further nodes to be processed in the priority queue. At
the end of the propagation a final set of nodes and their For example, consider the graph shown in Figure 1.
respective activation values are obtained. The total Two possible orderings for the Area nodes which are
complexity of the spread activation algorithm proposed is related to the Professor node “Schwabe” are ascending
O (|E| * log |V|), where E is the number of existing relation alphabetical order (“Hypermedia”, “Software
instances and V is the number of concept instances. For a Engineering”, “Web Services”), and decreasing strength
more detailed explanation about the propagation the user order, i.e., clustering measure, (“Web Services,
should refer to [7]. ““Software Engineering”, “Hypermedia”). The second
ordering gives more information to the user of the
3. Ordering Related Elements application, as it allows concluding that Professor
“Schwabe” has a stronger relation to the “Web Services”
Area compared to the other areas (in this example, he
In hypermedia applications, a node is typically related
published more articles in the “Web Services” Area than
to various other nodes. Most of the times, these relations
in the other areas).
are symbolic - they hold or they don’t. There is no idea of
strength or intensity of a relation.
A major advantage of the hypermedia paradigm is 3.2 Tests and Results
precisely the ability to show related information to a given
node, typically through a list of links. Sometimes this list The application used to test the results of both
is ordered according to the value of some attribute of the proposed functionalities was the PUC-Rio Informatics
related nodes, such as alphabetically ordered on the Department website (http://www.inf.puc-rio.br). In this
destination node’s name. In many applications where a web site it is possible to obtain information about the main
relevance ordering is used, it is determined manually, and research areas, professors, projects, students, labs and
explicitly specified to the system. This approach is very publications in the department. The knowledge base has
costly and almost intractable if done for all nodes in the around 2,630 node instances together with 6,554 relation
database, and all their corresponding relations. instances. A small part of the research ontology used in
The idea of ordering the relations of a current node is a the Web Site is shown in Figure 2.
classic technique in Adaptive Hypermedia area, There are several advantages in using this application
attempting to help users find their paths by adapting link as a test case, as it is representative of many similar cases
presentation to the goals, knowledge and other found in practice. First, precise analysis of results is
characteristics of an individual user. One of the techniques straightforward, since the domain experts are the
for adaptive link presentation is the adaptive ordering professors and students, which we could consult as
which was successfully used in various applications [1]. needed. Another interesting point is the fact that the
Typically, this ordering is based on user profile, goals and website database contains various inconsistencies and
previous navigation history. errors due to the input process used. Since each professor
Here, the proposed functionality calculates the strength is responsible for entering his information, some provide a
of the relations based on the analysis of the graph lot of information, whereas others rarely input new
containing all instances of nodes in the knowledge base. information.
This makes it possible to automatically generate an An additional problem is the fact that information
ordering according to the strength of the relation between regarding students and laboratories are typically incorrect
two nodes. The idea is to use the weights generated by the or incomplete, since there is no person specifically
weight mapping techniques to propose an automatic responsible for entering this information, as students are
ordering of the related elements. not allowed to input information directly into the system.
Various tests were carried out to evaluate the ordering
of related elements applied to some of the relation types in
(LA-Webmedia’04)
0-7695-2237-8/04 $20.00 © 2004 IEEE
each one of the knowledge bases. The goal of the each test made good sense. The results obtained (Table 1) show
was to evaluate if the ordering obtained reflected the that the weight mapping techniques can be very useful to
strength of the relations, and therefore was valuable to the propose an ordering of the related elements of a node
user of the application. instance. They can provide extra information for the user
of an application, or to an algorithm that processes that
knowledge base, such as the spread activation algorithm.
Laboratory Project
Table 1. Evaluation results for the relations ordering

functionality (PUC-Rio Informatics Dept. website).
Professor Area
Relation type # Instances %Positively
Analyzed evaluated
Professor-Area 15 100.00%
Publication Student
Laboratory- 7 85.00%
Professor
Figure 2. Conceptual model for the PUC-Rio Informatics Area-Professor 10 100.00%
Department website
Professor- 4 100.00%
Student
We used domain experts to evaluate the results, who
examined the ordering proposed by the system and
evaluated if the ordering obtained was satisfactory and We also tested this functionality in the context of the
useful. For small lists the ordering was examined in detail; Portinari Project website (http://www.portinari.org.br), an
for longer lists the experts analyzed the general aspect of application documenting the artwork, life and times of a
the list, but focused more on the first elements, supposedly famous Brazilian painter. The main difference is that for
the most important. Four relation types were chosen to be this application we used the specificity measure to
evaluated, all evaluated through the combined measure. calculate the weights of the relations. For lack of space,
The first relation type chosen was the Professor-Area we do not present a detailed analysis of the results for this
relation. The semantics of strength desired here is that the application. The tests were conducted similarly as for the
larger the number of students, publications, projects, etc. PUC-Rio Informatics Department website. Four types of
the professor has in a given area, the stronger is his relations were considered, with a total of 80 different
relation to it. Tests were conducted for all professors. The orderings analyzed. We obtained a positive evaluation in
second relation type analyzed was the Laboratory- 96.25% of the tests which show that the functionality was
Professor. The desired semantics is that the more very successful in this application as well.
publications, projects, products and advised students a Not always the best ordering for a relation is obtained
professor has in common with the lab, the greater his analyzing the instances graph. For example, in the case of
importance to it. Tests were conducted for all seven publications of a professor it is not clear what is the
existing labs in the department. semantics for evaluating that a publication is more
The third relation analyzed was the Area-Professor important than another. Typically the importance of a
relation. The intended semantics was that the higher the publication is given by where it was published -
number of publications, projects and advised students a conference proceedings, journal, magazine, etc. This
professor has in a given area, the greater his importance to information is usually stored as a property or attribute of
that area. All ten existing areas in the department were the publication. Therefore, in this case, an ordering by the
analyzed. The fourth and last relation type analyzed was value of a specific attribute could be more interesting than
the Professor-Student relation. A differing aspect for this the one provided by the weight mapping technique.
type of relation was the fact that the size of the lists was Consequently, one should bear in mind that the best
much bigger than in the previous three cases, which makes ordering for a relation will be determined by the type of
this ordering and its analysis more complex. It is tasks the user of the application has to accomplish. It
oftentimes difficult for Professors to come up with an should be a decision of the knowledge engineer of the
ordered list of importance of the students they have given application domain which ordering to use,
advised over their entire career. Each professor analyzed depending on the usage contexts.
the ordered list provided, and was asked to say whether or
not it seemed to be a good ordering. 4. Detecting and Suggesting Relations
The results obtained in all test cases were very good, as
almost all the lists were positively evaluated by the Although one of the main advantages of hypermedia
experts, agreeing that the ordering provided by the system applications is the possibility of connecting different
(LA-Webmedia’04)
0-7695-2237-8/04 $20.00 © 2004 IEEE
concepts that have some relation among them, this is only the confirmation of the areas related to the publication by
true if the links provided are meaningful. The process of the user, the system could suggest as probable co-authors
adding relationships among concepts is usually done by of the publication, students and professors who typically
human beings in a totally manual way. This is a hard task write publications together in those areas with the given
that consumes a lot of time and requires great knowledge professor, and so on and so forth.
of the specific domain of the application. To suggest a new relation, the user provides a starting
For example, consider the PUC-Rio Department of node, which the system uses as the input node for the
Informatics website. There are approximately 1,600 spread activation algorithm. To prevent suggestion of
publications stored in the website. For each stored links already present in the knowledge base, the spread
publication, the website includes a list of relations to its activation must be configured by adding a restriction
authors, the areas in which the publication is relevant, the rejecting all nodes to which the given node already has a
projects which are related to the publications, etc., relation. The nodes obtained from the spread activation
requiring a tremendous amount of work. Many times, as in algorithm are then presented to the user as possible related
this case, this task is shared among various users, in order nodes to the given node; the user has the option of
to make it a little easier. For example, each professor is immediately inserting any suggested relation in the
responsible for entering the information regarding his own knowledge base.
publications. In many cases, there is decentralized input of In addition to suggesting relations, it also associates a
information, and various inconsistencies can arise from numeric weight that indicates the strength of that
this process, mainly due to incomplete knowledge on the suggestion. The analysis of this weight is difficult, and
part of the user entering the information. varies from relation to relation. Naturally, an instance of a
In various knowledge bases there exists redundancy. relation suggested which has a higher value than another
For example, in the research domain a professor who has suggested instance of the same relation, has a higher
several publications in a specific area has a great chance likelihood of being true.
of being related to that area. If this information is not Next, an example will be presented to clarify the use of
explicit in the knowledge base by a direct edge connecting this functionality. Considering the instances graph shown
the professor to the given area, this might be an in Figure 1, it is possible to observe that professor
inconsistency or error in the knowledge base. “Schwabe” has a relation with three distinct areas (“Web
The idea behind the proposed functionality of detecting Services”, “Hypermedia” and “Software Engineering”). If
and suggesting relations is to identify, for the user or the the system was asked to propose new relations of the type
administrator of the system, possible relations among Professor-Area, it could suggest the relation with the area
concepts, which are not explicit in the knowledge base but of “Information Retrieval”, since professor “Schwabe” has
have a great possibility of existing. That is, the system a publication in this area, and also advises a student in it.
detects possible new relations which were not previously The absence of this relation in the knowledge base
in the knowledge base. Not every detected relation comes could be due to an error. In this case, professor
from an inconsistency in the knowledge base. Sometimes, “Schwabe” is indeed related to the area of “Information
a relation might not exist at an initial moment but, over Retrieval”, but this relation was not stored in the
time, that relation becomes latent due to the modifications knowledge base due to errors in the information input
that are happening in the knowledge base. process. It is also possible that, when the database started
This kind of functionality can also benefit other to be populated, this was really not one of his areas, but
processes. In particular, it can be very helpful in the after he started publishing papers and advising students in
process of updating the knowledge base. Most hypermedia that area, it became true, but this relation was never
applications have their knowledge bases updated actually inserted in the knowledge base.
constantly (that is, some concepts are inserted in the base, Another possibility is that professor “Schwabe” only
other concepts are deleted; some relations among the has direct relations to his main research areas, and since
existing concepts are added, and others are deleted). This “Information Retrieval” is not one of those, he has no
functionality can help the user in the task of inserting new direct relation to it in the knowledge base. In this case, the
concepts and relations among concepts by using the pre- lack of this relation in the base is not an error. Even if this
existing knowledge in the base. is the case, this inference is still very useful in various
In this scenario, it would be very useful if, as the user contexts. If a search for professors in this area is done in
starts to input and insert new relations, the system could the system, it might be interesting to show professor
suggest other probable relations. For instance, the user Schwabe as one of the results, since he has at least some
could begin by inserting the relation with the professor experience in the area, even though no explicit relation is
who wrote the publication. After that, the system would actually stored. In any case, it is important to observe that
automatically suggest as related areas to the publication, the decision on whether or not to insert a suggested
from the main research areas of the given professor. After relation in the knowledge base is taken by the user(s).
(LA-Webmedia’04)
0-7695-2237-8/04 $20.00 © 2004 IEEE
There are several types of relations in this application, precision rate of the functionality. The goal of the tests
and evaluating the functionality for all of them would be was to analyze if the proposed system suggests new
too expensive. Some specific relations were evaluated. relations with an acceptable precision, where the meaning
Relations involving Laboratories and Students were good of acceptable varies from application to application. In the
candidates since they had much fewer instances than they positive case, it could be employed by the users of an
should have in practice. A balanced analysis of this application, either for error and inconsistency
functionality should also test relations that were identification, or for aiding in the insertion of new
thoroughly filled - those relations where most of the instances in the knowledge base.
actually occurring relations were already explicit in the The graph presented in Figure 3 presents the change in
knowledge base. The intuition we had and wanted to the precision as function of the number of suggested
confirm was that for these relations the precision of the relations. The horizontal axis represents the number of
proposed functionality would be lower. We divided the suggested relations for a given relation type, and the
relations in 3 distinct groups - strong, medium, and weak - vertical axis represents the precision value at a given
based on the average number of instances the relation had, point. The graph was constructed as follows. For each
relative to the expected number of instances, using the relation type, we use the spread activation algorithm to
semantics of each relation (e.g. a paper must have at least obtain a list with all the relation suggestions for that type.
one author). This list was sorted from the best suggestion to the worst.
The algorithm suggests various new relation instances, Table 2 presents the results obtained. As expected, the
ordered for each type of relation, and a real number - its precision value diminishes as more relations instances are
weight - associated to each suggestion. To be of practical suggested. Also, the precision was much higher for the
value, it is necessary to establish a threshold for the relations in the weak and medium groups. This is due to
weights, to filter out meaningless suggestions. The the fact that in these relations there are more missing
difficulty is to establish this single threshold value, since it relations instances, and therefore the level of correct
should be different for each type of relation, because the suggestions tends to be higher. Generally speaking, 329
semantics of each relation type is completely different. relations instances above the threshold were suggested
The approach used was to use existing relations in the with an average precision rate of 78.7%. Ignoring the
database as a “training set”, to collect the weights threshold, the system proposed 834 new relation instances
assigned to them by the algorithm. The threshold value is with an average precision of 75.9%. Both results are very
obtained as a function of these collected weights, which in encouraging, given the fact that the main goal of the
our case was min {min weight, (avg weigth - std. dev.)}. functionality is not to automatically generate new relations
Several other possibilities were considered, and for other instances but to identify them for the user, who has the
domains this function may have to be adjusted. option of accepting or not the suggestion. The correctly
To analyze the precision of the results obtained it was suggested relations instances were responsible for an
necessary to manually evaluate each suggestion proposed increase of 10% in the number of relations in the base.
by the system, using the help of domain experts. They We also tested this functionality in the Portinari Project
classified the suggestions as correct or incorrect. For website. This application is interesting because its
instances where there was any doubt, we classified them database is highly consistent, and the domain model is less
as incorrect. The precision was calculated by identifying redundant, in the sense that there are fewer semantically
the percentage of correct relation suggestions. Recall meaningful transitive paths to be explored by the spread
could not be calculated since we did not have a list of all activation algorithm. Given these characteristics, as
correct instances which were missing in the base (indeed, expected, the suggestion of relations was not very
this is the reason why the functionality was very useful). effective. We can conclude that the utility of this
functionality is proportional to the level of inconsistency
4.1. Tests and Results of the knowledge base, and to the redundancy of the
semantic domain of the application.
Several tests were made to evaluate the suggestion of We also developed an interface where the inferred
relations functionality in the PUC-Rio Department of links are presented together with the existing links. We
Informatics website application. The basic methodology used different colors so the user could differentiate them
of the tests consisted of choosing a set of relations, and for and offered an easy one-click solution for the user to
each one of them asking the spread activation system to insert the inferred link, turning it into an explicit link in
suggest new instances for them. After that, domain experts the database, if he wishes, and has the appropriate
evaluated the proposed new relation instances, and permissions. Users greatly appreciated this functionality.
decided whether or not they should exist in the knowledge The same ideas proposed here can also be used to
base. Based on the hits and misses we calculated the suggest links for relations that do not exist in the
conceptual model of the application, as opposed to
(LA-Webmedia’04)
0-7695-2237-8/04 $20.00 © 2004 IEEE
Table 2. Evaluation of the suggestion of relations functionality (PUC-Rio Informatics Dept. website)
N. of Precision N. of Precision
Group Instances
Relation Suggestions ( (above Suggestions (all)
Average
Above the Thresh.) (all)
Threshold )
Publication-Lab Weak 0.007 16 100.0% 305 81.0%
Student-Lab Weak 0.0513 20 80.0% 119 67.2%
Lab-Area Medium 0.369 0 0.0% 11 81.8%
Student-Area Medium 0.61 113 98.2% 158 90.5%
Publication-Area Strong 1.04 17 64.6% 175 64.6%
Product-Area Strong 1.26 5 60.0% 66 62.1%
Total 329 78.7% 834 75.9%
Suggested Relations Precision Graph
120,00%
100,00%
80,00%
Precision
Publication-Lab
60,00%
Student-Lab
Lab-Area
40,00%
Student-Area
20,00% Publication-Area
Product-Area
0,00%
1 14 27 40 53 66 79 92 105 118 131 144 157 170 183 196 209 222 235 248 261 274 287 300
Suggested instances
Figure 3. Links suggestion precision graph (PUC-Rio Informatics website)
relation instances for existing ones. For example, it is techniques and themes most closely related to it, even
possible to suggest links between professors, even though though such relations do not exist in the conceptual
such relations are not present in the conceptual model. model. This functionality works as an inference machine
In the case of the Portinari Project application the trying to do proximity search for node instances that are
benefits of such a strategy become clearer. For example, close to a particular node instance.
in this domain, an exhibition is related to the paintings Some qualitative tests were done for this kind of
exhibited in it. A painting is related to the techniques used relation suggestion in both applications, and the results
to paint it, and to its themes. An interesting suggestion of obtained seemed to be very good. We intend to further
links would be to propose, for a given exhibition node, the explore this particular use in future works.
Spreading Activation) system proposed by Crestani [5].
5. Related Work This system searches for relevant Web pages by
autonomously navigating through the Web using
An interesting system which uses spread activation associations between pages. The navigation is processed
techniques is the WebSCSA (Web Search by Constrained and controlled by means of a Constrained Spreading
Activation model. The first big difference to our work is
(LA-Webmedia’04)
0-7695-2237-8/04 $20.00 © 2004 IEEE
that the spread activation is carried on the Web (not in a we can use both semantic information from the domain
particular application) and therefore no domain and the user profile to perform adaptations. We also
information is available. Also, in our spread activation the intend to investigate in more detail the applicability and
similarity of pages is calculated using semantic utility of suggesting links for relations that do not exist in
information from the domain model while in WebSCSA the conceptual model of the application and the results
the textual contents of the web page are considered. provided by it. In addition, we are also working on
ONTOCOPI [6] presents an approach similar to ours additional refinements in the proposed engine,
for processing ontology-based information through spread experimenting with alternative functions, and other forms
activation techniques for suggesting relations. It is applied of exploiting semantic information.
for identifying communities of practices (COPs) in an
organization. The system tries to suggest persons which Acknowledgement. The research presented in this
are closely related and therefore have common interests. paper was partly funded by scholarships from PUC-Rio
ONTOCOPI attempts to uncover informal COP relations and CNPq, and research grants from CNPq and FAPERJ.
(those which are often indeterminate and expensive to We also want to thank LES/LAC, TecWeb and Milestone
establish and monitor) by spotting patterns in the formal laboratories for providing the necessary infra-structure for
relations represented in ontologies, traversing the developing this work.
ontology from instance to instance via selected relations.
The activation in their system is propagated through a 7. References
semantic network only, and there exists no idea of
extracting semantics from the link structure such as the [1] Brusilovsky, P.: Efficient techniques for adaptive
weight mapping techniques proposed in our work. Their hypermedia. Intelligent Hypertext: Adaptive techniques for
work uses the spread activation system and the suggestion the World Wide Web. C. Nicholas and J. Mayfield, Eds.,
of relations in a much narrower scope than the system Lecture Notes in Computer Science, vol 1326, Berlin:
proposed in this paper. We believe that our system could Springer-Verlag, 1997, pp. 12-30.
be successfully used for the same task as ONTOCOPI. [2] Chen, H., and NG, T.: An Algorithmic Approach to Concept
As previously mentioned, link ordering has been used Exploration in a Large Knowledge Network (Automatic
for Adaptive Hypermedia applications; the main Thesaurus Consultation); Symbolic Branch-and-Bound vs.
Connectionist Hopfield Net Activation. Journal of the
difference with respect to the one presented in this paper American Society for Information Science 46(5):348-369,
is that the type of information and the algorithms used in 1995.
Adaptive systems is based on a model of the individual [3] Cohen, P., and Kjeldsen, R.: Information Retrieval by
user, and its context of use. In our case, we use semantic Constrained Spreading Activation on Semantic Networks.
information from the node instances and its relations, Information Processing and Management, 23(4):255-268,
which is the same for all users. We envision the use of 1987
both technologies together as being an even more [4] Crestani, F.: Application of Spreading Activation
powerful method for ordering the presentation of links in Techniques in Information Retrieval. Artificial Intelligence
Review, 11(6): 453-482, 1997.
hypermedia applications.
[5] Crestani, F., Lee, P.L.: Searching the Web by Constrained
Spreading Activation. Information Processing &
6. Conclusions Management, 36(4), 2000, 585-605.
[6] O’Hara, K., Alani, H., and Shadbolt, N.: Identifying
In this paper, we showed how a similarity processing Communities of Practices: Analyzing Ontologies as
engine can be used to provide some new functionality in Networks to Support Community Recognition, IFIP-WCC
model-based applications. The proposed engine uses 2002, Montreal, 2002, Kluwer.
[7] Rocha, C., Schwabe, D., Poggi, M.: A Hybrid Approach for
semantic information from the model and its instances to
Searching in the Semantic Web., to appear, Proceedings of
explore the instances graph using a hybrid spread the WWW2004 Conference, NY, May, 2004. Available at
activation algorithm. The proposed engine proved to http://www2004.org/proceedings/docs/1p374.pdf.
perform well in presenting links to related information in [8] Stojanovic, N., Struder R., and Stojanovic, L.: An approach
an order that reflects the semantic closeness of the for the Ranking of Query Results in the Semantic Web. Proc.
corresponding information. It was also successfully used of ISWC '03 (Sanibel Island, FL, October 2003), Spring-
to suggest new relation instances to the user of an Verlag, 500-516.
application, helping users in inserting new information in [9] Yates, B., and Neto, B.: Modern Information Retrieval.
the database and also in identifying possible ACM Press, New York, USA, 1999.
inconsistencies and errors in it.
As previously mentioned, we plan on integrating the
proposed engine with adaptive hypermedia applications so
(LA-Webmedia’04)
0-7695-2237-8/04 $20.00 © 2004 IEEE

Integrating Semantic Concept Similarity in Model-Based Web Applications

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Integrating Semantic Concept Similarity in Model-Based Web Applications

Transféré par

Droits d'auteur :

Formats disponibles

Integrating Semantic Concept Similarity in Model-Based Web Applications

Cristiano Rocha1, 2, Daniel Schwabe1, and Marcus Poggi1

Abstract might help overcome a lack of knowledge about the

restrictions, it propagates its activation to its neighbors.

priority queue is then reordered. The node that was just

Table 1. Evaluation results for the relations ordering

Suggested Relations Precision Graph

Figure 3. Links suggestion precision graph (PUC-Rio Informatics website)

Vous aimerez peut-être aussi