(Register No: CS0604) : Submitted by

CONTENT-BASED IMAGE RETRIEVAL THROUGH CLUSTERING
SYNPOSIS (Phase I)
Submitted by
PRASAD BANOTH
(Register No: CS0604)
Under the Guidance of

Ms. ANBARASI . M.S
to the Pondicherry University in partial fulfilment of the requirements for

the award of degree of
MASTER OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
(Distributed Computing Systems)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

AND INFORMATION TECHNOLOGY
PONDICHERRY ENGINEERING COLLEGE

PUDUCHERRY - 605 014
DECEMBER - 2007
SYNP0SIS
AIM
In this project as the first phase Knowledge Summarization is carried out through
Clustering Techniques which is stored in Buffer/Database from where Content-Based
Retrieval is done to carry out predictive measure ( by Medical researches) for Medical
Image Data.
MOTIVATION
We are drowning in data but starving for knowledge. Data Mining Techniques
[1][2][3] can be solution to discover knowledge from large data. Especially in medical
data large data is present where as Knowledge discovered is minimal, so the predictive
action is less.
RELEATED WORK
The goal of content-based image retrieval (CBIR) [8] is to retrieve images similar
[4] to an image/sketch provided by the user.
Very large collections of images are growing ever more common. From stock
photo collections and proprietary databases to the World Wide Web, these collections are
diverse and often poorly indexed.
IMAGE RETRIEVAL
Image retrieval is mainly based on the visual contents, as we consider this we

should take care about visual features such as shape[5][6][7], color[6], texture.
SHAPE
Shape [5][6][7] of an object is an important feature for image and multimedia

similarity retrievals[4]. There is a variety of techniques that has been proposed in the
literature for shape representation. Shape representation techniques are divided into two
categories:
 Boundary-Based and
 Region-Based.
Boundary-Based methods use only the border of the object shape and completely ignore
its interior. On the other hand, the Region-Based techniques take into account internal
details besides the boundary details.
COLOR
Color [6] is a commonly used feature for realizing content-based image retrieval
(CBIR)[8].There are many approaches for CBIR which is based on well known and
widely used color histograms.
 Using a single color histogram for the whole image, or

 Local color histograms for a fixed number of image cells,
 The one we propose (named Color Shape) uses a variable number of histograms,
depending only on the actual number of colors present in the image.
There are mainly three Color-Based approaches for Content-Based Image Retrieval
• Global Color Histogram (GCH)
• GRID
• Color-Shape Histograms
A SIMILARITY RETRIEVAL ALGORITHM FOR IMAGE DATABASES
Given a query image Q, this algorithm retrieves images that contain Common similar
regions with Q, where objects of Q may appear in the target images in scaled, translated, or
color shifted form.
This algorithm performs an image indexing phase in which images in the database are
indexed before images matching a given query image Q can be retrieved Indexing of
images is done only once at the beginning and when new images are added to the database,
while the steps for querying need to be repeated for each query image.
Steps involved in both indexing of images and querying for similar images are:
 Generating Signatures for Sliding Windows.
 Clustering the Sliding Windows.
 Region Matching.
 Image Matching
CLUSTERING
Clustering [3][10] of data is a method by which large sets of data are grouped into
clusters of smaller sets of similar data.
CLUSTERING ALGORITHMS
A clustering algorithm [3][10] attempts to find natural groups of components (or

data) based on some similarity. Also, the clustering algorithm [3][10] finds the centroid of
a group of data sets. To determine cluster membership, most algorithms evaluate the
distance between a point and the cluster centroids. The output from a clustering algorithm
is basically a statistical description of the cluster centroids with the number of
components in each cluster.
CLUSTERING TECHNIQUES
 K-Means Method [3][10]: For Content-Based Image Retrieval as the first phase.
 X-Means: Enhanced version of K-Means Method.
K-MEANS CLUSTERING
The basic step of k-means clustering [3][10] is simple. In the beginning we

determine number of cluster K and we assume the centroid or center of these clusters. We
can take any random objects as the initial centroids or the first K objects in sequence can
also serve as the initial centroids.
Then the K means algorithm [3][10] will do the three steps below until convergence
Iterate until stable (= no object move group):
 Determine the centroid coordinate

 Determine the distance of each object to the centroids
 Group the object based on minimum distance
Start
No of Clusters K
Centroid No
yes
Distance objects No
to centroid objects Move End
group
Grouping based on
minimum distance
Fig: 1.1 K-Means Algorithm flow chat
SPSS (Statistical Package for the Social Sciences)
SPSS (originally, Statistical Package for the Social Sciences)[11] was released in
its first version in 1968, and is among the most widely used programs for statistical
analysis in social science. It is used by market researchers, health researchers, survey
companies, government, education researchers, and others. In addition to statistical
analysis, data management and data documentation are features of the base software.
Statistics included in the base software:
• Descriptive statistics: Cross tabulation, Frequencies, Descriptives, Explore,

Descriptive Ratio Statistics.
• Bivariate statistics: Means, t-test, ANOVA, Correlation (bivariate, partial,
distances), Nonparametric tests.
• Prediction for numerical outcomes: Linear regression.
• Prediction for identifying groups: Factor analysis, cluster analysis (two-step, K-
means, hierarchical), Discriminant.
DICOM (Digital Imaging and Communications in Medicine)
Digital Imaging and Communications in Medicine (DICOM)[12] is a standard for

handling, storing, printing, and transmitting information in medical imaging. Developed
by the National Electrical Manufacturers Association (NEMA) in conjunction with the
American College of Radiology (ACR). It includes a file format definition and a network
communications protocol. The communication protocol is an application protocol that
uses TCP/IP to communicate between systems. DICOM files can be exchanged between
two entities that are capable of receiving image and patient data in DICOM format.
DICOM enables the integration of scanners, servers, workstations, printers, and network
hardware from multiple manufacturers into a picture archiving and communication
system. DICOM has been widely adopted by hospitals and is making inroads in smaller
applications like dentists' and doctors' offices.
A single DICOM file[12] contains both a header (which stores information about
the patient's name, the type of scan, image dimensions, etc), as well as all of the image
data (which can contain information in three dimensions).
SYSTEM MODEL
In this proposal, the main goal is Content-Based Image Retrieval (CBIR).Hear the
performance enhancement is done through X-Means. As a first phase, a very large
collection of images of medical database (DB) of World Wide Web is done as a
collection. Hear, retrieval is not fast due to size of image. So the process of indexing and
storing it as Knowledge Summarization (KS) is done as next level. From KS Content-
Based Retrieval can be performed from output unit.
CBIRC ARCHITECTURE
DB1
DB level
DB2 DB level clustering
Knowledge Process
Summarization
DB3
Content
DBN Based
Retrieval
Input Distributed DB Output
Figure: 1.2 CBIRC High level Architecture.
IMPLEMENTATION
Functions performed:
Private Sub Form_Load ()
Private Sub cmdReset_Click () ….. Reset data.
Private Sub txtNumCluster_Change () …... Change number of cluster and reset data.
Private Sub Picture1_MouseDown () …… Collecting data and showing result.
Private Sub Picture1_MouseMove
Sub kMeanCluster () …... main function to cluster data into k number
of Clusters.
Function dist …… calculate Euclidean distance.
Private Function min2 (num1, num2) ….. Return min value between two numbers.
RESULTS
When User click picture box to input new data (X, Y), the program will make
group/cluster the data by minimizing the sum of squares of Euclidean distances between
data and the corresponding cluster centroid. Each dot is representing an object and the
coordinate (X, Y) represents two attributes of the object. The colors of the dot and label
number represent the cluster.
Figure: 1.3 Sample input and output for K-Meams Clustering algorithm.
COCCLUSION
Very large collections of images are growing ever more common. With the
proliferation of image data, the need to search and retrieve images efficiently and
accurately from a large image database[8][9] or a collection of image databases [8][9] has
drastically increased. Shape [5][6][7] and Color [6] of an object plays an important role
while image retrieval. To address such a demand, Content-Based Image Retrieval [8]
through Clustering (CBIRC) is proposed In the system. As the first phase Knowledge
Summarization is carried out through Clustering Techniques [3][10] which is stored in
interfacing unit that can act as Buffer/Database[8][9] from where Content-Based
Retrieval [8] is done to carry out predictive measure ( by Medical researches) for Medical
Image Data.
With the advances in image processing, information retrieval, and database
management, there have been extensive studies on content-based image retrieval (CBIR)
[8][9] for large image databases. CBIR systems retrieve images based on their visual
contents. Earlier efforts in CBIR research have been focused on effective feature
representations for images. The visual features of images, such as color [6], texture, and
shape features [5][6][7] have been extensively explored to represent and index image
contents, resulting in a collection of research prototypes and commercial systems. To
address such a demand, Content-Based Image Retrieval through Clustering (CBIRC) is
proposed. This method provides database clustering [3][10] and improves the query
processing by analyzing the summarized knowledge.
REFERENCES
1. Jiawei Han, Micheline Kamber, “Data Mining: Concepts and Techniques”,

Morgan Kaufmann Publishers, 2001.
2. Arun K.Pujari, “Data Mining”, Universities Press (India) Ltd., 2001.
3. Margaret H.Dunham, “Data Mining: Introductory and Advanced Topics”, Pearson
education, 2003.
4. Apostol Natsev, Rajeev Rastogi, and Kyuseok Shim, “WALRUS: A Similarity
Retrieval Algorithm for Image Databases”, IEEE Transactions on Knowledge and
Data Engineering, Vol.16, no.3, March 2004.
5. Safar, M., Shahabi, C. and Sun, X. “Image Retrieval by Shape: A Comparative
Study”, In Proceedings of IEEE International Conference on Multimedia and
Expo(ICME’00), 2000, 141-144.
6. Stehling, R. O., Nascimento, M. A., and Falcao, A. X. , “On Shapes of Colors for
Content-based Image Retrieval”, In ACM International Workshop on Multimedia
Information Retrieval (ACM MIR’00), 2000, 171-174.
7. Zhang, D. S. and Lu, G, “Generic Fourier Descriptors for Shape-based Image
Retrieval”, In Proceedings of IEEE International Conference on Multimedia and
Expo (ICME’02), 1 (2002), 425-428.
8. Shyu, M.-L., Chen, S.-C., Chen, M., and Zhang, C, “Affinity Relation Discovery
in Image Database Clustering and Content-based Retrieval”, Accepted for
publication (short paper), ACM International Conference on Multimedia, October
10-16, 2004.
9 Chengcui Zhang, Shu-Ching Chen, and Mei-Ling Shyu, “Multiple Object
Retrieval for Image Databases Using Multiple Instance Learning and Relevance
Feedback”, IEEE International Conference on Multimedia and Expo
(ICME), 2004
10 www.Kardi Teknomo page.com
11 www.SPSS.com
12 www.DICOM.nema.org

(Register No: CS0604) : Submitted by

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

(Register No: CS0604) : Submitted by

Transféré par

Droits d'auteur :

Formats disponibles

CONTENT-BASED IMAGE RETRIEVAL THROUGH CLUSTERING

Under the Guidance of

to the Pondicherry University in partial fulfilment of the requirements for

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

PONDICHERRY ENGINEERING COLLEGE

Image retrieval is mainly based on the visual contents, as we consider this we

Shape [5][6][7] of an object is an important feature for image and multimedia

 Using a single color histogram for the whole image, or

A SIMILARITY RETRIEVAL ALGORITHM FOR IMAGE DATABASES

A clustering algorithm [3][10] attempts to find natural groups of components (or

The basic step of k-means clustering [3][10] is simple. In the beginning we

 Determine the centroid coordinate

Fig: 1.1 K-Means Algorithm flow chat

SPSS (Statistical Package for the Social Sciences)

Statistics included in the base software:

• Descriptive statistics: Cross tabulation, Frequencies, Descriptives, Explore,

DICOM (Digital Imaging and Communications in Medicine)

Digital Imaging and Communications in Medicine (DICOM)[12] is a standard for

Input Distributed DB Output

Figure: 1.2 CBIRC High level Architecture.

1. Jiawei Han, Micheline Kamber, “Data Mining: Concepts and Techniques”,

Vous aimerez peut-être aussi