Vous êtes sur la page 1sur 15

International Journal of Computer science engineering Techniques-– Volume 2 Issue 4, May - June 2017


Comprehensive Overview of Existing Semantic Annotation

Mrs. Sayantani Ghosh , Prof.Samir Kumar Bandyopadhyay
Department of Computer Science and Engineering,University of Calcutta

The semantic features that based on keywords or annotationsmaybe very subjective and
time consuming. Whereas,the semantic features that based on visual contentis complex becauseof the
inference procedures. Automatic image annotation is good approach toreduce the semantic gap. This
paper provides an overview of the most common techniques of different types of annotated image
retrieval systems along with classification methods.
Keywords — Contextual Information,Automatic Image Annotation,Content-based Image Retrieval,Text-
Based Image Retrieval, Ontology, Semantic Annotation
Introduction contextual knowledge is the
information,and/or skills that have particular
In real world objects are seen
meaning because of the conditions that form
embedded in a specific context and its
part oftheir description.It is of prime interest
representationis essential for the analysis
to make efficient use of contextual
and the understanding ofimages. Contextual
knowledgein order to narrow the semantic
knowledge may stem from multiple sources
gap, and to improve the accuracy of image
of information, includingknowledge about
the expected identity, size, position and
relative depth of anobject within a scene [1- Images get their semantic meaning
2]. For example, topological knowledge for image interpretation or understanding,
canprovide information about objects that and it is consequently difficult for an
are most likely to appear within a imageretrieval system to discern the
specificvisual setting, for example an office meaning sought by a user when he is
typically contains a desk, a phone, and a searching for aparticular image.Image
computer. Spatial information can also semantics seems to be important forimage
provide informationabout which locations retrieval related tasks.
within a visual setting are most likely to
Image semantics is that it is not fully,
contain objects, e.g.in a beach scene, the sky
norexplicitly stored in the image pixels, and
is usually placed at the top, while the sea is
it is usually hard for a machineto access the
below. Given aspecific context, this kind of
image semantics using only image
knowledge can help reasoning on data to
features.We can therefore conclude that the
improve imageannotation.
imageinterpretation process requires often a
Contextual information means the reasoning mechanism over the detected
collection of relevant conditions and objectsin the image, which is usually based
surrounding influences that make asituation on cognition and on the past experiences.
unique and comprehensible. While

ISSN: 2455-135X http://www.ijcsejournal.org Page 31

International Journal of Computer science engineering Techniques-– Volume 2 Issue 4, May - June 2017

Knowledge modelsshould go further Today, to use automatic image

than the simple description of specific annotation in order to fill the semantic gap
objects that may appear inimages, and rather between low level features of images and
model the image context through the understanding their information in retrieving
description of conceptsand the semantic process has become popular. Since
relationships between them.Image semantics automatic image annotation is crucial in
is a multi-level paradigm, i.e.there are understanding digital images several
several levels of semantics (or methods have been proposed to
interpretation) for a given image and automatically annotate an image. This paper
themajor challenge of image retrieval reviews current methods for visualizing
systems is then to be able to extract such semantic effects on the annotated images.
semanticsfrom images and to adapt to the
Different Methods for Semantic Image
user background in order to be efficient and
The World Wide Web has become
If we look at image at Figure 1, the
one of the most important sources of
semantics at theobject level could be
information due to the fast development of
{"Bear", "Iceberg"}, the semantics at the
internet technology. Search engines are the
partial level could be"Polar Bear standing on
most powerful resources for finding visual
a small iceberg" and the semantics at the full
content (e.g., images, videos, etc.) from
level could be"global warming threatens the
World-Wide Web, These search engines use
survival of the polar bears". Therefore, we
the surrounding text near the image for
can noticethat the difficulty of processing
describing the content of an image and rely
and extracting the semantics from images
on text retrieval techniques for searching
increasessignificantly according to the
particular images [1].However, there are two
sought level of abstraction. Currently, most
significant drawbacks of such engines; (a)
approachesfor image retrieval deals with the
when the surrounding words are ambiguous
first level of semantic content. These
or even irrelevant to the image; search
approachestarget to provide efficient
results using this method usually contain
methods to learn semantics classes from
many irrelevant images. (b)The retrieval of
visual image features.
images will be ineffective when different
languages are used in the description of the
images if this image collection is to be
shared globally around the world. It is
difficult to map semantically equivalent
words across different languages [2-3].
The rapid growth of multimedia
content comes with the need to effectively
managethis content by providing
Figure 1 Global warming threatens the survival of polar mechanisms for image indexing and
retrieval that canmeet user expectations.
Towards this goal, semantic image analysis

ISSN: 2455-135X http://www.ijcsejournal.org Page 32

International Journal of Computer science engineering Techniques-– Volume 2 Issue 4, May - June 2017

and interpretationhas been one of the most oftraining images by machine learning.In the
interesting challenges during this last text-based approaches, images are indexed
decade, andseveral attempts have addressed by a set of text descriptors which are
the, previously introduced, semantic gap extracted from the surrounding context.
problem.In particular, a typical method for
Content-based means that the search
narrowing the semantic gap is to perform
will analyze the actual contents of the image
automaticimage annotation.
rather than the metadata such as keywords,
Automatic image annotation was tags, and/or descriptions associated with the
introduced in the early2000s, and first image. The term 'content' in this context
efforts focused on statistical learning might refer to colors, shapes, textures, or
approaches as they providepowerful and any other information that can be derived
effective tools to establish associations from the image itself. CBIR is desirable
between the visual featuresof images and the because most web based image search
semantic concepts.A recent review on engines rely purely on metadata and this
automaticimage annotation techniques was produces a lot of garbage in the results. Also
proposed [4-5]. having humans manually enter keywords for
images in a large database can be inefficient,
Early efforts aim to narrow the
expensive and may not capture every
semantic gap for mapping low-level features
keyword that describes the image. Thus a
(such as color, texture, shape andsalient
system that can filter images based on their
points) directly to some specific semantic
content would provide better indexing and
concepts such as indoor/outdoor, nature,
return more accurate results. The basic
animal, food, and pedestrian.These
CBIR will look as shown in the figure 2.
approaches havequickly become
cumbersome and impractical following the
normal request of a largerannotation
vocabulary. Indeed, it would be impossible
to build a detector for eachpotential concept,
as they are too many [5].
Content-based image retrieval
(CBIR) is used to solve text based image
retrieval [4]. In this technique different low-
level visual features are extracted from each
image in the image database and then image Figure 2 Basic System of CBIR
retrieval is to search for the best match to
CBIR systems is classified into two
the features that are extracted from the query
categories: text query or pictorial query. In
image.CBIR based approaches shows good
text query based systems, images are
accuracy for detecting
characterized by text information such as
specificobjects/concepts, such as faces,
keywords and captions. Text features are
pedestrians, cars, etc. These approaches
powerful as a query, if appropriate text
selectthe parameters of the model so as to
descriptions are given for images in an
minimize the detection error on a set
image database. However, giving

ISSN: 2455-135X http://www.ijcsejournal.org Page 33

International Journal of Computer science engineering Techniques-– Volume 2 Issue 4, May - June 2017

appropriate descriptions must be done retrieval system is crucial since flexible

manually in general and it is time formation and modification of queries can
consuming. There are many ways one can only be obtained by involving the user in the
pose a visual query. A good query method retrieval procedure. User interfaces in image
will be natural to the user as well as retrieval systems typically consist of a query
capturing enough information from the user formulation part and a result presentation
to extract meaningful results. In pictorial part. There are various techniques have been
query based systems, an example of the proposed to retrieve the image effectively
desired image is used as a query. To retrieve and efficiently from the large set of image
similar images with the example, image data. These are as follows:
features such as colours and textures, most
• Gaussian Mixture Models
of which can be extracted automatically, are
used. • Semantic template
• Wavelet Transform
The typical CBIR system performs • Gabor filter
two major tasks. The first one is feature • Support Vector Machine
extraction, where a set of features, called • Color Histogram
image signature or feature vector, is • 2D Dual-Tree Discrete Wavelet
generated to accurately represent the content
of each image in the database. A feature
• There are three fundamental bases
vector is much smaller in size than the
for content based image retrieval, i.e.
original image, typically of the order of
visual feature extraction,
hundreds of elements (rather than millions).
multidimensional indexing, and
The second task is similarity measurement
retrieval system design.
(SM), where a distance between the query
• Feature extraction and indexing of
image and each image in the database using
image database according to the
their signatures is computed so that the top
chosen visual features, which from
“closest” images can be retrieved. Instead of
the perceptual feature space, for
exact matching, content-based image
example color, shape, texture or any
retrieval calculates visual similarities
combination of above.
between a query image and images in a
• Feature extraction of query image.
database. Accordingly, the retrieval result is
not a single image but a list of images • Matching the query image to the
ranked by their similarities with the query most similar images in the database
image. according to some image-image
similarity measure. This forms the
Many similarity measures have been search part of CBIR systems.
developed for image retrieval based on • User interface and feedback which
empirical estimates of the distribution of governs the display of the outcomes,
features in recent years. Different their ranking, the type of user
similarity/distance measures will affect interaction with possibility of
retrieval performances of an image retrieval refining the search through some
system significantly. For content-based automatic or manual preferences
image retrieval, user interaction with the scheme etc.

ISSN: 2455-135X http://www.ijcsejournal.org Page 34

International Journal of Computer science engineering Techniques-– Volume 2 Issue 4, May - June 2017

Ontology is a specification of a i) The advantages of such systems

conceptualization. Ontology defines a range from simple users
set of representational terms called searching a particular image on
concepts; each concept has three basic the web.
components: terms, attributes and ii) Various types of professionals
relations. Terms are the names used to like police force for picture
refer to a specific concept, and can recognition in crime prevention.
include a set of synonyms that specify iii) Medicine diagnosis
the same concepts. Attributes are iv) Architectural and engineering
features of a concept that describe the design
concept in more detail. Finally relations v) Fashion and publishing vi)
are used to represent relationships Geographical information and
among different concepts and to provide remote sensing systems
a general structure to the ontology.The
Text-Based Image Retrieval (TBIR)
main parts of image annotation are
is currently used in almost all general-
shown in figure 3.
purpose web image retrieval systems today.
This approach uses the text associated with
an image to determine what the image
contains. This text can be text surrounding
the image, the image's filename, a hyperlink
leading to the image, an annotation to the
image, or any other piece of text that can be
associated with the image [6].
In image mining, meaningful
information can automatically extract
meaningfulinformation from a huge of
image data are increasingly indemand. It is
an interdisciplinary venture that
essentiallydraws upon expertise in artificial
intelligence, computervision, content based
image retrieval, database, data
mining,digital image processing and
machine learning.
Image mining frameworks [7] are
grouped into two broadcategories: function-
driven and information-driven. Theproblem
of image mining combines the areas of
content-basedimage retrieval, data mining,
Figure 3 Main Parts of CBIR
image understanding anddatabases. Image
CBIR has many applications in real world mining techniques include image
such as: retrieval,image classification, image

ISSN: 2455-135X http://www.ijcsejournal.org Page 35

International Journal of Computer science engineering Techniques-– Volume 2 Issue 4, May - June 2017

clustering, image segmentation,object instead of using text based key words,

recognition and association rule images shouldbe defined by their visual
mining.Image Retrieval is performed by contents as colour and texture.
matching the features of aquery image with Manytechniques in this research area have
those in the image database. The been developed for manyimage retrieving
collectionof images in the web are growing systems as research and commercial,
larger and becoming morediverse. havebeen built. It has established a general
Retrieving images from such large framework of imageretrieval. In this paper
collections is achallenging problem. The we will focus our effort mainly to
research communities study aboutimage thecontent-based image retrieval.Text-based
retrieval from various angles are text based image retrieval [7-8] can be based
and contentbased. The text based Image onannotations that were manually added for
retrieval is used for traditionaltext retrieval disclosing theimages (keywords,
techniques to image annotations. descriptions), or on collateral text that
isavailable with an image (captions,
Digital images are currently widely
subtitles,nearby text). It applies traditional
used in medicine,fashion, architecture, face
text retrieval techniques toimage annotations
recognition, finger print recognitionand bio-
or descriptions. Most of the image
metrics etc. Recently, Digital image
retrievalsystems are text-based, but images
collections arerapidly increased very huge
frequently have little or noaccompanying
level. That image contains a hugeamount of
textual information.
information. Conversely, we cannot make
sure thatinformation is useful unless it is Text data present in multimedia viz.
implemented so we needsufficient browsing, video and images contain useful information
searching, and retrieving the for automaticannotation, indexing. The
images.Retrieving image has become a very Process of Extraction of information is
dynamic researcharea. detection, localization, tracking,extraction,
enhancement, and recognition of the text
Two major research communities
from a given image [9]. However, there
such as databasemanagement and computer
aredifferences in text in style, orientation,
vision have study image retrievalfrom
size, and alignment, as well as low contrast
various ways such as text based and content
image andcomplex background make the
based. Late1970s, the text-based image
automatic text extraction problem more
retrieval had been traced back. Avery
difficult and timeconsuming. While critical
popular framework of image retrieval was to
surveys of related problems such as
annotate theimages by keyword and they
document analysis face detectionand image
used text based databasemanagement system
& video indexing and retrieval can be found,
for operating image retrieval. Emergence of
the problem of text extraction isnotsurveyed
large-scaleimage collections in the early
1990s, the major difficulties aremanual
image annotation is also accurate. A variety of approaches to text
extraction from images and video have
To avoid thissituation, content-based
beenpresented for many applications like
image retrieval was improved. Itmeans,
address block location [14], content-based

ISSN: 2455-135X http://www.ijcsejournal.org Page 36

International Journal of Computer science engineering Techniques-– Volume 2 Issue 4, May - June 2017

image/videoindexing [10, 16], page figure 4 the word “Sania” is extracted from
segmentation [12-13], and license plate the figure and is shown in figure 5.Text-
location [11-15]. In spite of such incritical based image retrieval has some limitations
studies, it is still not easy to design a such as task of determiningimage content is
general-purpose Text Extraction system. highly perspective.
This isoften a result of so many possible
sources of variation once extracting text
from complex images,or from images
having difference in style, color, orientation,
font size and alignment.Although images
non-inheritable by scanning book covers,
CD covers, or different multi-
coloreddocuments have almost similar
characteristics as the document images.
Figure 4 Result for Query video Sania
Text in video images can classify
intocaption text and scene text. The caption
text is artificially overlaid on the image and
scene textexists naturally in the images.
Some researchers prefer to use the term
‘graphics text’ for scenetext, and Figure 5 Query word “Sania”
‘superimposed text’ or ‘artificial text’ for
Model-based approaches for
caption text [17-18]. It is documented that
automatic image annotation are based on the
scenetext is harder to detect.The text of
idea of finding a mapping between low-level
input images need to be identified as the image features and semantic concepts (e.g.
input image contains any text, the sky, car, sea). This is achieved by analyzing
existenceor non-existence of text among the a set of already labeled images, called the
image. Several approaches assuming certain training set, and creating a corresponding
types of videoframe or image contain text prediction model. Model-based approaches
(e.g., recording cases or book covers). can be classified into two categories:
However, in the case of video,the amount of probabilistic modeling methods and
frames containing text is far smaller than the classification-based methods. In
amount of frames while not text. Thetext probabilistic modeling, it aims to learn the
detection stage detects the text in image. joint probability distribution between image
features and keywords. Classification-based
The unique properties of video approaches treat theproblem of automatic
collections (e.g., multiple sources, noisy image annotation as a classification
features and temporal relations) examine the problem. For this purpose,each keyword is
performance of these retrieval methods in considered as an independent class and a
such a multimodal environment, and identify classifier is learned to predictthe right
the relative importance of the underlying class(s) of test images. A widely used
method to construct the classifier is
retrieval Components. Based on query string
thetechnique of support vector machines[19-
matching videos are retrieve from database
and sort it based on relevance. Video in the

ISSN: 2455-135X http://www.ijcsejournal.org Page 37

International Journal of Computer science engineering Techniques-– Volume 2 Issue 4, May - June 2017

Search-based Automatic Image social aspects that make photos that popular.
Annotationretrieves a set of similar Photos are not only a documentary or
imagesfrom a large scale database of already reminders; they are also an emotional
labeled images, such as the web or journal. Moreover, photos are a rich type of
specializedphoto sharing platforms, e.g., content that”is worth a thousand words”,
Flickr. Subsequently, tags/keywords of they capture our moods and feelings and
similar imagesare analyzed and propagated
provide a proof that we have been there.
to the target image. More specifically, to
Additionally, photos represent a subtle
identify similarimages, a two-phase search
process is applied- Semantic/Contextual means of social communication. People post
Search and Search by Image Contents. their photos as a statement of positive
Manual image annotation is a time affirmation regarding the way they live,
consuming task and as such it is particularly what they do and what they achieved.
difficult to be performed on large volumes To address the limitations of manual
of content. There are many image tagging, research on automatic image
annotation tools available but human input is annotation has received a considerable
still needed to supervise the process. So, attention. Automatic image annotation aims
there should be a way to minimize the at associating unlabeled images with
human input by making the annotation keywords that describe their contents. Early
process fully automatic. In Automatic image research on automatic annotation techniques
annotation images are automatically focused on using machine learning
classified into a set of pre-defined categories techniques. The idea is to use a dataset of
(keywords). Low-level features of the already labeled images in order to train
training images are extracted. Then, models for predicting labels for un-
classifiers are constructed with low-level annotated images. However, creating good
features to give the class decision. Lastly, training datasets is a challenging and time
the trained classifiers are used to classify consuming task. Indeed, most available
new instances and annotate un-labelled datasets are limited to images corresponding
images automatically. Automatic image to small set of predefined concepts.
annotation plays an important role in Therefore, the annotations generated by such
bridging the semantic gap between low-level approaches are also limited and they cannot
features and high-level semantic contents in meet the diverse ways in which people
image access. describe and search for images.
Photos represent one of the most The aim of automatic image
common content types which are annotation is to generate descriptive
contributed and shared among the users of keywords (tags) for unlabeled images
the Internet. This can be explained without (or with only a little) human
according to the availability of digital interference. Many methods have been
photography devices which provide an easy proposed for automatic image annotation,
and a cheap medium for producing photos. which can be roughly categorized into two
At the same time, the bandwidth of the groups: keyword-based methods and
current Internet connections allows fast ontology-based methods [19]. Keywords-
upload of photos. There are also several based methods: Arbitrarily chosen keywords
from controlled vocabularies, i.e. restricted

ISSN: 2455-135X http://www.ijcsejournal.org Page 38

International Journal of Computer science engineering Techniques-– Volume 2 Issue 4, May - June 2017

vocabularies defined in advance, are used to Ontology-based label extraction is

describe the images. The basic goal of image extensively used to interpret the semantics
annotation is presented in figure 6. found in image and video data. Particularly,
ontology-based label extraction is one of the
main steps in object class recognition, image
annotation, and image disambiguation.
These applications have important roles in
the field of image analysis, and as such, a
number of variations of the ontology-based
label extraction used in these applications
have been reported in the literature. These
Figure6 The goal of automatic image annotation variations involve ontology development
and utilization, and can affect the
Information Retrieval techniques are applicability (e.g., domain- and application-
well-established, they are not effective when dependency) as well as the accuracy of the
problems of concept ambiguity appear. On output. Unfortunately, the variability aspect
the other hand, neither search based only on of this variation has neither been established
semantic information may be effective, nor tracked. Thus, the variations were not
since: a) it does not take into account the configured.
actual document content, b) semantic Ontology is a conceptual knowledge
information may not be available for all source, which mainly consists of concepts
documents and c) semantic annotations may and their hierarchical relationships. A
cover only a few parts of the document. concept is a tag identified by a word, phrase
Hybrid solutions that combine keyword- or label, and describes a real-world entity.
based with semantic-based search deal with Ontology may also have properties that
the above problems. Developing describe the concepts and nonhierarchical
methodologies and tools that integrate relationships among the concepts of the
document annotation and search is of high ontology. Ontology may be used as a
importance. For example, researchers need hierarchicallyenabled browsing mechanism
to be able to organize, categorize and search and can be employed in semantics
scientific material (e.g., papers) in an extraction, the process of accessing ontology
efficient and effective way. Similarly, a and inferring knowledge based on its
press clipping department needs to track concepts and relationships. Ontology-based
news documents, annotating specific label extraction, a type of semantics
important topics and searching for extraction, produces true labels for an input
information. image. It is shown in figure 7.
Ontology based method is a way of
describing concepts and their relationships
into hierarchical categories [20]. This is
similar to classification by keywords, but the
fact that the keywords belong to a hierarchy
enriches the annotations. For example, it can
easily be found out that a car or bus is a
subclass of the class land vehicle, while car
and bus have a disjoint relationship.

ISSN: 2455-135X http://www.ijcsejournal.org Page 39

International Journal of Computer science engineering Techniques-– Volume 2 Issue 4, May - June 2017

and Cyc [30], are upper-level ontologies that

consist of a large number of concepts and
their relationships. These ontologies may be
used with various applications. However, an
existing ontology may be customized
depending on the task at hand and the
desired output.
Ontology customization usually
involves extracting a specific part of the
ontology, which includes the required
concepts and some of their relationships [25-
26]. In addition, ontology-like knowledge
may be developed if the required concepts
Figure 7 A typical Content-Based Image Retrieval system
or their relationships do not exist in the
Generally, given an input image, the existing ontologies. The mining procedure
ontology-based label extraction process has depends greatly on the type of the output,
several steps. First, the input image features that is, if the output is part of the input (i.e.,
are projected and matched with concepts in image disambiguation), then a similarity
the ontology through a process called technique is used; otherwise, a flooding
mapping. The relationships connected to the procedure is implemented (i.e., image
matched concepts are then analyzed, and annotation). Image annotation and object
new concepts are identified sequentially recognition, as mentioned earlier, predict
until the final output is extracted. This object(s) in a given scene based on the
process is called mining. Existing surveys extracted features. Subsequently, these
mainly focus on a single applications require an ontology that forms
application/problem (e.g., recognition, associations among features and labels for
annotation, and disambiguation) and have objects.
reviewed the existing literature from several Generally, existing ontologies do not
perspectives. include the visual properties of the described
Generally, the existing literature objects [9, 27-28]. Thus, feature-based label
focuses on comparing and analyzing extraction uses customized ontologies or
methods based on the characteristics of the ontology-like knowledge developed for the
output, with no linkage to the technique and task at hand. The structure of these specific
type of ontology used. The input for task ontologies depends on the task at hand
ontology-based label extraction may be and the desired output. Variations of this
image features or object labels (maps) structure are reflected in the ways by which
extracted using various image annotation the required image features are
techniques [21-24]. The mapping procedure represented.In the task-oriented category,
is constrained with the type of input and ontology-like knowledge is developed to
ontology characteristics. Features input, smoothly fit the task at hand. These ontologies,
which have a wide range, require learning however, cannot be used elsewhere. Two main
techniques. Meanwhile, maps can be approaches, standard and advanced
mapped directly (e.g., using syntactic string approaches, are then proposed. Their main
matching). Ontology is task-independent differences are in the information conveyed
and is developed by domain experts. by their ontologies, which require the use of
Existing ontologies, such as WordNet [29] different techniques. In the standard

ISSN: 2455-135X http://www.ijcsejournal.org Page 40

International Journal of Computer science engineering Techniques-– Volume 2 Issue 4, May - June 2017

approach, the ontology conveys the before, identifying images similar to the un-
following information: object labels, annotated image is a core component of the
hierarchical relationships, and low-level automatic annotation process. Accordingly,
features. In the advanced approach, the automatic image annotation has also to deal
ontology has an additional feature, i.e., the with two main challenges of CBIR
spatial relationships among concepts. techniques, namely the accuracy and the
During the ontological construction,
speed of the applied technique. Generally,
concepts are created based on labels
obtained from a dataset of labeled images. the accuracy of CBIR is ruled by the low
Then, another set of concepts with coarse level image representation that is used, i.e.,
granularity is manually created to facilitate image features. In turn, the complexity of
the categorization principle of the ontology. extracting image features, representing them
Low-level features are then assigned as as descriptor vectors and comparing the
properties to each concept using a descriptors are major factors that influence
supervised machine learning process. the retrieval speed. Therefore, in order to
In label extraction, features are ensure the efficiency of automatic image
extracted from the input image, labeled, and annotation, solutions for improving the
then mapped to properties in the ontology accuracy and boosting the performance of
using a classification method. Mining is the applied CBIR process have to be
implemented as a propagation process,
investigated. Third, automatic image
which transfers from one concept to another
annotation has to address the issue of
over the hierarchical relations in a topdown
manner (from the concepts at the general estimating the relevance/importance
level to the concepts at a specific level). The between candidate annotations and the target
propagation process might be intermediate image.
and have more classification processes, in In general, the problem of CBIR is
order to filter out the concepts reached
the semantic gap between the high-level
through the propagation process. Finally, the
concepts obtained at the lowest level (i.e., image and the low-level image. In other
leaf) of the propagation process are selected words, there is a difference between what
as the output. image features can distinguish and what
For all its promising edge, search- people perceives from the image. As shown
based image annotation has to deal with in Fig. 4, SBIR can be made by extraction of
several challenges. The first challenge is low-level features of images to identify
posed by community tags as a main resource meaningful and interesting regions/objects
from which annotations (for unlabeled based on the similar characteristics of the
images) are extracted. User-tags are created visual features. Then, the object/region
in an uncontrolled and free-style manner, features will go into semantic image
thus, they are inherently noisy. Humans use extraction process to get the semantics
inconsistent terms to describe the same thing description of images to be stored in
or use the same term to express different database.Image retrieval can be queried
meanings. In other words, polysemy and based on the high-level concept.Query may
homonymy – two fundamental problems in be done based on a set of textual words that
information retrieval – are also present in will go into semantic features translator to
user-provided tags. Second, as mentioned get the semantic features from the query.
The semantic mapping process is used to
ISSN: 2455-135X http://www.ijcsejournal.org Page 41
International Journal of Computer science engineering Techniques-– Volume 2 Issue 4, May - June 2017

find the best concept to describe the learning tools to associate the low-level
segmented or clustered region/objects based features with object concept and will be
on the low features. This mapping will be annotated withthe textual word through
done through supervised or unsupervised image annotation process[1,13]. Semantic
learning tools to associate the lowlevel content obtained either by textual annotation
features with object concept and will be or by complex inference procedures based
annotated with the textual word through on visual content[14].
image annotation process[1,13]. Semantic
content obtained either by textual annotation
or by complex inference procedures based This paper attempted to provide an
on visual content[14]. overview of the most common techniques of
different types of image retrieval systems.
Most systems used low-level features, few
systems used semantic feature. Global
featuresfail to identify important visual
characteristics of images but it’s very
efficient in computation and storage due to
its compact representation. From another
perspective, local features that can be
extracted from images handle partial image
matching or searching for images that
contain the same object or same scene with
different viewpoints, different scale, changes
in illumination,etc. Therefore,local features
Figure 4. A typical Semantic-Based Image can identify important visual characteristics
Retrievalsystem of images but it ismore expensive
The semantic annotation means to computationally. The semantic features that
describe the semantic content in images and based on keywords or annotationsmaybe
retrieval queries. It requires some very subjective and time consuming.
understanding of the semantic meaning in Whereas,the semantic features that based on
images and retrieval query, and visual contentis complex becauseof the
standardization of representation of images. inference procedures. Automatic image
Based on the semantic annotation of images annotation is good approach toreduce the
and retrieval queries, semantic similarity semantic gap, butit still achallenging task
between images and a retrieval query can be due tothe different conditions of imaging,
compared. At present, semantic annotation occlusionsand the complexity, and difficulty
is implemented by some markup language to describe objects. In future, there is a need
such as XML based on a shared ontology to work more and more with available
definition.The semantic mapping process is techniques to deal with the semantic gap to
used to find the best concept to describe the enhance image retrieval.
segmented or clustered region/objects based Bridging the semantic gap for image
on the low features. This mapping will be retrieval still considered a big challenge.
done through supervised or unsupervised

ISSN: 2455-135X http://www.ijcsejournal.org Page 42

International Journal of Computer science engineering Techniques-– Volume 2 Issue 4, May - June 2017

Even though there are a lot of efforts and [4] Wang, C., Zhang, L., and Zhang,
works on image retrieval research, but it is H,“Learning to Reduce the Semantic Gap in
not enough to provide satisfactory Web Image Retrieval and Annotation”, In
performance. However, there are still some SIGIR‟08, Singapore, 2008.
spaces, which need to be improved besides
[5] P.Jayaprabhaand Rm.Somasundaram,
the challenges that is associated with
“Content Based Image Retrieval Methods
mapping low level to high-level concepts.
Using Graphical Image Retrieval Algorithm
Also overcome of the semantic gap in the
(GIRA)”, International Journal
broad domain database is complex because
ofInformation and Communication
the images in broad domains can be
Technology Research, Vol. 2 No. 1, 2012.
described using various concepts.There are
needto see better supportfor the image [6] T.Karthikeyan, P.Manikandaprabhu,
retrievalbased semantic concept with a focus "Function and Information
on the retrieval by abstract attributes, DrivenFrameworks for Image Mining - A
involving a significant amount of high- Review", International Journal ofAdvanced
levelreasoning about the meaning and Research in Computer and Communication
purpose of the objects. In addition, the Engineering(IJARCCE), Vol.2, Issue 11,
extracted semantic features should be pp.4202-4206, Nov. 2013.
applied for any kind of image collection.
Moreover, there is need to effective ways [7] S.K. Chang and A. Hsu, “Image
retrieve of similar images that are conform information systems: Where do we gofrom
to human perception and without human here?” IEEE Trans. on Knowledge and Data
interference. Engineering 4(5),1992.

References [8] H. Tamura and N. Yokoya, “Image

database systems: A survey”,
[1] Riad, A. M., Atwan,A., and Abd El- PatternRecognition 17(1), 1984.
Ghany,S,“ Image Based Information
Retrieval Using Mobile Agent”, Egyptian [9] Jung, Keechul, Kwang In Kim, and Anil
Informatics Journal, Vol.10, No.1, 2009. K Jain. "Text information extraction in
images and video: asurvey." Pattern
[2] Kherfi, M.L., Ziou, D., and Bernardi,A , recognition 37, no. 5 (2004): 977-997.
“Image retrieval from the World Wide Web:
issues, techniques, and systems”, ACM [10] H. J. Zhang, Y. Gong, S. W. Smoliar,
Computing Surveys Vol. 36, No. 1, pp.35– and S. Y. Tan, Automatic Parsing of News
67, 2004. Video, Proc. OfIEEE Conference on
Multimedia Computing and Systems, 1994,
[3] Riad, A. M., Atwan,A., and Abd El- pp. 45-54.
Ghany,S, “ Analysis of Performance of
Mobile Agents in Distributed Content Based [11] Y. Cui and Q. Huang, Character
Image Retrieval”, In Proc. IEEE Extraction of License Plates from Video,
international Conference on Computer Proc. of IEEE Conferenceon Computer
Engineering & Systems, ICCES 2008. Vision and Pattern Recognition, 1997, pp.
502 –507.

ISSN: 2455-135X http://www.ijcsejournal.org Page 43

International Journal of Computer science engineering Techniques-– Volume 2 Issue 4, May - June 2017

[12] A. K. Jain, and Y. Zhong, Page [20] Feng, S., and Xu, D. 2010.Transductive
Segmentation using Texture Analysis, Multi-Instance Multi-Label learning
Pattern Recognition, 29 (5)(1996) 743-770. algorithm with application to automatic
image annotation. In Expert Systems with
[13] Y. Y. Tang, S. W. Lee, and C. Y. Suen,
Applications 37, pp: 661–670.
Automatic Document Processing: A Survey,
Pattern [21] A. A. Abu-Shareha and R. Mandava,
Semantics Extraction in Visual Domain
Recognition, 29 (12) (1996) 1931-1952.
Based on WordNet, 5th FTRA International
[14] B. Yu, A. K. Jain, and M. Mohiuddin, Conference on Multimedia and Ubiquitous
Address Block Location on Complex Mail Engineering (MUE), IEEE, 2011, pp. 212-
Pieces, Proc. OfInternational Conference on 219.
Document Analysis and Recognition, 1997,
[22] M. Atika, A. Akbar and A. Sultan,
pp. 897-901.
Knowledge Discovery using Text Mining: A
[15] D. S. Kim and S. I. Chien, Automatic Programmable Implementation on
Car License Plate Extraction using Modified Information Extraction and Categorization,
GeneralizedSymmetry Transform and Image International Journal of Multimedia and
Warping, Proc. of International Symposium Ubiquitous Engineering, 4 (2009).
on IndustrialElectronics, 2001, Vol. 3, pp.
[23] S. Banerjee and T. Pedersen, Extended
Gloss Overlaps as a Measure of Semantic
[16] J. C. Shim, C. Dorai, and R. Bolle, Relatedness, Proceedings of the 18th
Automatic Text Extraction from Video for International Joint Conference on Artificial
Content-basedAnnotation and Retrieval, Intelligence, Morgan Kaufmann Publishers
Proc. of International Conference on Pattern Inc., Acapulco, Mexico, 2003, pp. 805-810.
Recognition, Vol. 1, 1998, pp.618-620.
[24] S.-F. Chang, T. Sikora and A. Purl,
[17] S. Antani, D. Crandall, A. Overview of the MPEG-7 standard, IEEE
Narasimhamurthy, V. Y. Mariano, and R. Transactions on Circuits and Systems for
Kasturi, Evaluation of Methodsfor Detection Video Technology, 11 (2001), pp. 688-695.
and Localization of Text in Video, Proc. of
[25] R. Clouard, A. Renouf and M. Revenu,
the IAPR workshop on Document
An Ontology-Based Model for Representing
AnalysisSystems, Rio de Janeiro, December
Image Processing Application Objectives.,
2000, pp. 506-514.
International Journal of Pattern Recognition
[18] S. Antani, Reliable Extraction of Text and Artificial Intelligence (IJPRAI), 24
from Video, PhD thesis, Pennsylvania State (2010), pp. 1181-1208.
University,August 2001.
[26] T. M. Deserno, S. Antani and R. Long,
Ontology of gaps in content-based image
[19] Zhang, D., Islam, M.M., Lu, G. 2011. A retrieval, Journal of Digital Imaging, 22
review on automatic image annotation (2009), pp. 202-215.
techniques. In Journal of Pattern
[27] H. J. Escalante, M. Montes and E.
Recognition, pp: 1-17.
Sucar, Multimodal indexing based on

ISSN: 2455-135X http://www.ijcsejournal.org Page 44

International Journal of Computer science engineering Techniques-– Volume 2 Issue 4, May - June 2017

semantic cohesion for image retrieval,

Information retrieval, 15 (2012), pp. 1-32.
[28] X. Fan, Contextual disambiguation for
multiclass object detection, International
Conference on Image Processing, 2004, pp.
[29] C. Fellbaum, WordNet: An Electronic
Lexical Database, MIT Press (1998).
[30] D. Foxvog, Cyc, Theory and
Applications of Ontology: Computer
Applications, Springer, 2010, pp. 259-278.

ISSN: 2455-135X http://www.ijcsejournal.org Page 45