Vous êtes sur la page 1sur 53

ISSN : 2278 - 0181

Online Print Version

International Journal of

Engineering Research & Technology

IJERT

Volume - 1, Issue - 2, April 2012 Edition


Website : www.ijert.org, E-mail : info@ijert.org

International Journal of Engineering Research & Technology (IJERT)


Published by ESRSA Publications
ESRSA, Engineering and Scince Research Support Academy Publications publish monthly journal under
ISSN 2278-0181.
Online Version (e-copy)
http://www.ijert.org
Print Version
http://www.ijert.org/for-authors/journal-print-version-download
All the respective authors are the sole owner and responsible of published research and research papers
are published after full consent of respective author or co-author(s). For any discussion on research
subject or research matter, the reader should directly contact to undersigned authors.
COPYRIGHT
Copyright2013, IJERT.ORG
All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, scanning or
otherwise, except as described below, without the permission in writing of the Publisher.
Copying of articles is not permitted except for personal and internal use, to the extent permitted by
national copyright law, or under the terms of a license issued by the national Reproduction Rights
Organization.
All the published research can be referenced by students/readers/scholars/researchers in their further
research with proper citation given to original authors.
DISCLAIMER
Statements and opinions expressed in the published papers are those of the individual contributors, not the
statements and opinion of IJERT. We assumes no responsibility or liability for any damage or injury to
persons or property arising out of the use of any materials, instructions, methods or ideas contained
herein. We expressly disclaim any implied warranties of merchantability or fitness for a particular
purpose. If expert assistance is required, the services of a competent professional person should be sought.
Contact Information:
Email: editor@ijert.org
Website: http://www.ijert.org

Index
Paper Title
Optimization Of Heat Transfer Rate In Wax Tank For Wax Injection Molding Machine
A Survey on Maintaining Privacy in Data Mining
A Review On Web Mining
A Survey Paper on HyperlinkInduced Topic Search (HITS) Algorithms for Web Mining

Paper ID
IJERTV1IS1003
IJERTV1IS1004
IJERTV1IS1005
IJERTV1IS1006

Page no.
1
6
7
10
11
15
16
23

An Efficient CT Image Reconstruction with Parallel Modeling for Superior Quantitative Measures

IJERTV1IS1009

24

29

6
7

Microcontroller Based Lift System


Persian Signature Verification using Convolutional Neural Networks
A Comparative Model For Image Processing & Texture Classification Using Cross-diagonal Texture Matrix
(cdtm) & Grey-level Co-occurrence Matrix (glcm)

IJERTV1IS1010
IJERTV1IS2001

30
33

32
38

IJERTV1IS2002

39

49

Sr.no
1
2
3
4

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

OPTIMIZATION OF HEAT TRANSFER RATE IN WAX


TANK FOR WAX INJECTION MOLDING MACHINE
A.H.Makawana1, Hitesh.K.Patel2, and J.S pstel3
Proffesor in govt.engg.college,dahod,gujarat,india. patelpatelhitesh88@gmail.com
2
student of govt.engg,college,dahod,gujarat,india, patelhitesh88@ymail.com
3
Assistent professor in H.G.college of engineering,vahelal, ahmedabad,India, livejigar@gmail.com
1

Abstract- Investment casting is basically a metal shaping

technique. It is a foundry practice by which high precision


castings are manufactured. This is a specialized foundry
technology and is considered a high - tech area. This process
has gained popularity on the basis of the superior quality of
the castings produced. Making of wax pattern is important
process for investment casting. Wax pattern is made by wax
injection molding machine. For making wax pattern the one
important issue is temperature of wax. The temperature of
wax is maintained between 60c for obtain the best result.
Temperature of wax in wax injection molding machine is
maintain uniform in wax tank. The band heater and cooling
band are placed around the wax tank for maintain the
uniform temperature of wax. The stirrer also available for
rotating the wax in wax tank. So the locations of heater
band, diameter of wax tank, speed of stirrer are important
factor for optimum heat transfer rate in wax tank. So
optimization of this parameter is done by taguchi metthod.
This purpose cad software for modeling and analysis and
Minitab software for DOE are used.
I. INTRODUCTION
Investment casting has gain popularity in high tech area on
the basis of superior quality and higher accuracy. In
investment casting important step is to make the wax pattern.
Wax pattern are made in wax injection molding machine. For
the good quality of wax pattern the temp of wax is important
issue. For the best quality of wax pattern temperature of wax
is maintain 60 centigrade. For this wax tank is used in wax
injection molding machine. In wax tank band heater and
cooling band are used to maintain the uniform temperature.
Stirrer is also mounted in wax tank for rotate the wax in wax
tank. Design and dimension are taken from MODE TECH
MACHINE PVT LTD at vatava GIDC Ahmadabad, Gujarat.
In this research work my objective is to optimize the heat
transfer rate in wax tank for uniform temperature of wax. For
optimization of heat transfer rate various parameter like heater
position, diameter of tank and stirrer speed are important. So
optimization of heat transfer is done by using taguchi method.
Taguchi method is reducing the number is experiment. Here
three level and three parameter is used in taguchi. Also used
the Minitab software for taguchi method. For this model of
wax tank is made in solid works then convert this model in
STEP file and imported in ANSYS for CFD analysis. In this
paper present the CFD analysis of wax tank and comparing

this result to practical reading. For taking practical reading


temperature sensor and thermocouple is used.
II. Taguchi method
The Taguchi method involves reducing the variation in a
process through robust design of experiments. The overall
objective of the method is to produce high quality product at
low cost to the manufacturer. The Taguchi method was
developed by Dr. Genichi Taguchi of Japan who maintained
that variation. Therefore, poor quality in a process affects not
only the manufacturer but also society. He developed a
method for designing experiments to investigate how different
parameters affect the mean and variance of a process
performance characteristic that defines how well the process is
functioning. The experimental design proposed by Taguchi
involves using orthogonal arrays to organize the parameters
affecting the process and the levels at which they should be
varied; it allows for the collection of the necessary data to
determine which factors most affect product quality with a
minimum amount of experimentation, thus saving time and
resources. Analysis of variance on the collected data from the
Taguchi design of experiments can be used to select new
parameter values to optimize the performance characteristic.
In this article, the specific steps involved in the application
of the Taguchi method will be described and examples of
using the Taguchi method to design experiments will be given.
In this project work three parameter and three level are
used. So L9 array is used which is shown below.
Table:1 Taguchi Array
Analysis Diameter
Heater
Speed
(mm)
position
(rpm)
1
380
A
15
2
380
B
20
3
380
C
25
4
415
A
20
5
415
B
25
6
415
C
15
7
350
A
25
8
350
B
15
9
350
C
20
III. PROCEDURE
For good quality of wax pattern the temperature of wax is
remain uniform in wax tank at 60 c. For taking the practical
reading thermocouple sensor are used which are placed at the

30
1

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

inner surface of inner tank. These sensors are giving the


temperature at outside surface of tank. For measuring the inner
temperature of wax one another thermocouple sensor is use.
From the practical reading observe that the temperature at
outside surface is height and at the middle of tank is lowest.
The practical reading is plotted in bellow table.
Table 2
Practical reading of wax tank
MAXIMUM TEMPERATURE
333 K

MINIMUM TEMPERATURE

329K

Here ANSYS workbench is used for CFD analysis of wax


tank. For CFD analysis following step are perform. In step 1
wax tank modal make in solidworks are converted in to STEP
file and this step file are imported in ANSYS. In step 2 the
meshing of this wax tank model is done. In meshing CFD
mesh type is selected and fine meshing is done by using ten
node tetrahedral elements. The reason for selecting this
element is that is gives the good meshing on curvature parts.
In step 3 various domains is define. Domain 1 is for stirrer,
domain 2 is wax, domain3 is inner tank, domain 4 is for heater
and domain 5 is for glass wool. out of all domain the domain 2
is fluid while other domain are solid. Now next step is define
the interface between each domain here four interfaces are
taken. First interface between domain 1 and domain 2, second
interface is taken between domain 2 and domain 3, third
interface is taken between domain 3 and domain 4, fourth
interface is taken between domain 4 and domain 5.Now
specify the boundary condition for CFD analysis. In boundary
condition the stirrer speed is given 20 rpm. Next initialize the
all domain temperature at 25 c. here specify the heater input
temperature at 65 c. After giving boundary condition solve it
for result. Here 100 iteration is run for accurate result. After
solution go to the post processor for getting result. The result
from Ansys is plotted below.
Table 3
ANSYS result

MAXIMUM TEMPERATURE
333 K
MINIMUM TEMPERATURE
328K
Fig 1 show the wax tank model in solidworks. In fig red color
indicate the band heater which is placed on inner tank. Around
the heater glass wool is placed as insulator and finely outer
cover around the glass wool. In ANSYS analysis the outer
cover is neglected. In below Fig wax domain is not showing.
The wax domain is consider in ANSYS analysis for giving the
wax property.

Fig 1: Wax Tank drawing in solid works


Fig 2 indicates the meshing model of the wax tank. Meshing
detail of the wax tank is shown table below.
Table 4
DOMAIN
DOMAIN 1
DOMAIN 2
DOMAIN 3
DOMAIN 4
DOMAIN 5
ALL DOMAIN

Meshing detail of wax tank


NODES

15815
59140
8190
15392
13351
111888

ELEMENT
69849
318434
4092
7200
48106
447681

In fig 3 ANSYS CFX analyses is shown. From we can say that


the maximum temperature is 333K and minimum temperature
is 328 K. Fig 4 indicate velocity streamline of wax we can say
that the maximum velocity is at the stirrer blade which shown
by red portion in fig.
From practical reading and Ansys result we can say that
ANSYS is give the close reading to the practical. So here
ANSYS is used for the perform nine analysis as shown in
taguchi array. From this analysis we can decide which modal
of wax tank is best for maintain the uniform temperature of the
wax in wax tank.

31
2

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

Fig 2: Meshing modal of tank

Fig 4: velocity streamline of wax in wax tank.

Fig 3: Temperature contour of wax in ANSYS.


Fig 5: Analysis 1

32
3

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

Fig 6: Analysis 2
Fig 8: Analysis 4

Fig 7: Analysis 3

Fig 9: Analysis 5

33
4

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

Fig 12: Analysis 8


Fig 10: Analysis 6

Fig 11: Analysis 7

Fig 13: Analysis 9

34
5

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

Result of nine analyses are shown table below.

ABOUT AUTHOR

Table 5: Analysis Result

Analysis

Diameter
(mm)

Speed
(rpm)

Heater
position

1
2
3
4
5
6
7
8
9

380
380
380
415
415
415
350
350
350

15
20
25
15
20
25
15
20
25

A
B
C
B
C
A
C
A
B

Temperature
Difference
(k)
6.1
4.8
5.2
7.1
5.3
2.3
6.4
4.9
4.1

Alpesh Patel professor in government engineering college


dahod(M.Tech)
Hitesh Patel Student of Master Engineering (cad/cam)in
government Engineering college dahod, gujarat, india.
Jigar Patel assistance professor in Hasmukh Goswami College
of Engineering Vahelal, Ahmedabad.(M.Tech)

IV. CONCLUSION
Temperature of wax in wax pattern is important factor for the
best quality of wax pattern. For best quality of pattern the
temperature is at 60 c uniform in wax tank. From table 5 we
can say that analysis 6 is give the best result for maintain
uniform temperature because the temperature difference is
only 2.3 k. so the best modal is 415mm diameter, 25 rpm
speed and heater position A.
REFERENCES
[1]. Design and optimisation of conformal cooling channels in injection
moulding tools D.E. Dimla a, , M. Camilotto b, F. Miani b a
School of Design, Engineering and Computing, Bournemouth
University, 12 Christchurch Road, Bournemouth, Dorset BH13NA,
UK b DIEGM, Universit`a Degli Studi di Udine, via delle Scienze
208, 33100 Udine, Italy
[2]. Understanding heat transfer mechanisms during the cooling phase
of blow molding using infrared thermography A. Bendadaa,*, F.
Erchiquib, A. Kippingc aNational Research Council of Canada,
Industrial Materials Institute, 75 De Mortgane, Boucherville, Que.,
Canada J4B 6Y4 bUniversity of Quebec in AbitibiTemiscamingue, 445 Universite Blvd., Rouyn-Noranda, Que.,
Canada J9X 5E4 cUniversity of Siegen, Paul-Bonatz Strasse 9-11,
Siegen 57068, Germany Received 15 June 2004; accepted 25
November 2004.
[3]. Influence of injection parameters and mold materials on
mechanical properties of ABS in plastic injection molding Babur
Ozcelik , Alper Ozbay Erhan Demirbas a Department of
Mechanical Engineering, Gebze Institute of Technology 41400
Gebze-Kocaeli/Turkey b Department of Chemistry, Gebze Institute
of Technology, 41400 Gebze[4]. Process parameter optimization for MIMO plastic injection
molding by Wen-Chin Chen , Gong-Loung Fu b,c, Pei-Hao Tai b,
Wei-Jaw Deng d,Turkey.
[5]. Effects Of Radiation Heat Transfer On Part Quality Prediction Adi
Sholapurwalla ESI Group, Bloomfield Hills, Michigan Sam Scott
ESI Group, Bloomfield Hills, Michigan
Experimental Investigation of Phase Change Phenomena of
Paraffin Wax inside a Capsule S. A. Khot N. K. Sane B. S.
Gawali Department of Mechanical Engineering, Latthe polytechnic
Sangli. Department of Mechanical Engineering, Walchand College
of Engineering Sangli (Maharashtra) India.
[6] Introduction to CFD Basics by Rajesh Bhaskaran Lance Collins.
[7]. Design of experiment using the taguchi approach by Ranjit K Roy

35
6

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

A Survey on Maintaining Privacy in


Data Mining
Divya Sharma
Lecturer, Information Technology, Gandhinagar Institute of Technology, divya.sharma@git.org.in

AbstractData Mining is the process of discovering new


patterns from large datasets. The goal is to extract knowledge
from dataset in human understandable structure. Now a day we
all are using internet lot, data processing technologies, privacy of
data is a major issue in data mining .So Privacy Preserving Data
Mining has become very popular and in high demand. A number
of methods and techniques have been developed for privacy
preserving data mining. This paper provides a wide survey of
different privacy preserving data mining algorithms. I have
discussed more about one of algorithm Randomization and also
discussed merits and demerits of the same.
Index Terms Data mining, Privacy, Privacy-preserving data
mining, Randomization, Data Swapping Randomization.

Most methods use some form of privacy preservation


.transformed dataset is made available for mining and must
meet privacy requirements without losing the benefit of
mining. We classify them into the following three categories:
A. The randomization method
The Randomization method is a popular method in current
privacy preserving data mining. In which noise is added to the
data in order to mask the attribute values of records [3]. The
noise added is sufficiently large so that the individual values
of the records can no longer be recovered. In general,
randomization method aims at finding an appropriate balance
between privacy preservation and knowledge discovery.

I. INTRODUCTION

he main goal of data mining is to extract knowledge and


new patterns from large datasets in human understandable
structure. For data mining computations we have to first
collect data without much concern about privacy of data.
Because of privacy concerns some people are not giving right
information. Therefore Privacy preserving data mining has
becoming important field of research. In order to make a
publicly system secure, we must ensure that not only private
sensitive data have been trimmed out, but also that certain
Inference channels should be blocked as well with respect to
privacy. A number of effective methods for privacy preserving
data mining have been proposed [1]. This paper provides a
wide survey of different privacy preserving data mining
techniques, and points out their merits and demerits.
This paper is organized as follows. Section II, will introduce
the classification of privacy preserving methods. Section III,
will analyze the method of randomization for privacy
preserving on the original data. Section IV, will discuss the
swap randomization method. Randomization to protect
privacy will be discussed in section V. And section VI will
discuss the applications and section VII will discuss
conclusion and future work.

In randomization method, data collection will be done in two


steps. In first step, data providers randomize their data and
transmit randomized data to data receiver. In second step, data
receiver estimates original distribution of data using
distribution reconstruction algorithm.
B. The anonymization method
Anonymization method aims at making the individual record
be indistinguishable among a group records by using
techniques of generalization and suppression. The
representative anonymization method is k-anonymity. The
motivating factor behind the k-anonymity approach is that
many attributes in the data can often be considered quasiidentifiers which can be used in conjunction with public
records in order to uniquely identify the records. Many
advanced methods have been proposed, such as, p-sensitive kanonymity, (a, k)-anonymity [4], l-diversity, t-closeness, Minvariance, Personalized anonymity, and so on. The
anonymization method can ensure that the transformed data is
true, but it also results in information loss in some extent.

II CLASSIFICATION OF PRIVACY PRESERVING METHODS AND


TECHNIQUES

A number of effective methods for privacy preserving data


mining have been proposed [2].

C. The encryption method


Encryption method mainly resolves the problems that people
jointly conduct mining tasks based on the private inputs they
provide. These mining tasks could occur between mutual untrusted parties, or even between competitors, therefore,

26
7

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

protecting privacy becomes a primary concern in distributed


data mining setting. There are two different distributed privacy
preserving data mining approaches such as the method on
horizontally partitioned data and that on vertically partitioned
data. The encryption method can ensure that the transformed
data is exact and secure, but it is much low efficient.
III THE RANDOMIZATION METHOD
In this section, I have discussed the randomization method for
data privacy. The method of randomization can be described
as follows:
Consider a set of data records denoted by X = {x1 . . . xN}. For
record xi X, we add a noise component which is drawn
from the probability distribution Fr(R).These noise
components are drawn independently, and are denoted R1 . . .
RN. Thus, the new set of distorted records are denoted by x1
+r1 . . . xN +rN. This is denoted as Z1..ZN
In general, it is assumed that the variance of the added noise
is large enough, so that the original record values cannot be
easily guessed from the distorted data. Thus, the original
records cannot be recovered, but the distribution of the
original records can be recovered. [5]
Thus, if X be the random variable denoting the data
distribution for the original record, Y is the random variable
describing the noise distribution, and Z is the random variable
denoting the final record, we have:
Z=X+R
X=ZR
By subtracting R from the approximated distribution of Z, it is
possible to approximate the original probability distribution X.

A Advantage
One key advantage of the randomization method is that it is
relatively simple, and does not require knowledge of the
distribution of other records in. Noise is independent of data
and does not need entire dataset for perturbation. The
randomization method can be implemented at data collection
time, and does not require the use of a trusted server
containing all the original records in order to perform the
anonymization process. The randomization approach has also
been extended to other applications such as OLAP [6].
And it is much faster compared to SMC.

B Disadvantage
It treats all records equally irrespective of their local
density. Therefore, outlier records are more susceptible to
adversarial attacks as compared to records in more dense
regions in the data.
C. Mulplicative Randomization
In this type of randomization, records are multiplied by
random vectors. And then transform data so that inter-record
distances are preserved approximately. These types of
randomization can be applicable in Privacy-Preserving
clustering and classification. Attacks can be known inputoutput or known sample attack. In known input-output attack,
Attacker knows some linearly independent collection of
records and their perturbed versions and in Known sample
attack, Attacker has some independent samples from the
original distribution.
D. Randomization for Association Rule Mining
This type of randomization is done through deletion and
addition of items in transactions. Following steps are
performed:
First we should select-a-size operator. Now assume
transaction size = m and a probability distribution p[0], p[1],
, p[m], over {0, 1, , m}.Given a transaction t of size m,
generate randomized transaction t as: Select j at random from
0, .., m using above distribution Select j items from t
(uniformly without replacement) and place in t For each item
a not in t, place a in t with probability , here p is the
randomization level[7] .

27
8

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

IV DATA SWAPPING RANDOMIZATION


Noise addition or multiplication is not the only technique
which can be used to perturb the data. A related method is that
of data swapping, in which the values across different records
are swapped in order to perform the privacy-preservation [8].
One advantage of this technique is that the lower order
marginal totals of the data are completely preserved and are
not perturbed at all. Therefore certain kinds of aggregate
computations can be exactly performed without violating the
privacy of the data. We note that this technique does not
follow the general principle in randomization which allows the
value of a record to be perturbed independently of the other
records. Therefore, this technique can be used in combination
with other frameworks such as k-anonymity, as long as the
swapping process is designed to preserve the definitions of
privacy for that model.
Swap randomization falls within the broad family of
randomization testing methods. Given a metric of interest
(e.g., the number of frequent item sets in the data),
randomization testing techniques produce multiple random
datasets and test the null hypothesis that the observed metric is
likely to occur in the random data. If the metric of interest in
the original data deviates significantly from the measurements
on the random datasets, then we can reject the null hypothesis
and assess the result as significant. The key characteristic of
the randomization techniques is in the way that the random
datasets are generated. Rather than assuming that the
underlying data follows a given distribution and sampling
from this distribution, randomization techniques randomly
shuffle the given data to produce a random dataset. Shuffling
is meant to preserve some of the structural properties of the
dataset, for example, in a 01 matrix we may want to preserve
the total number of 1s in the dataset, or the number of 1s in
each column. In the case of swap randomization, the generated
samples preserve both the column and row margins. This
constraint can also be thought of.

both row and column margins, and takes into account the
global structure of the dataset. A motivating example for why
it is important to maintain both column and row margins is
given in the next section
A. Applications
Swap randomization has been considered in various
applications. An overview is presented in a survey paper by
Cobb and Chen [2003]. A very useful discussion on using
Markov chain models in statistical inference is Besag [2004],
where the case of 01 data is used as an example. The problem
of creating 01 datasets with given row and column margins is
of theoretical interest in itself; see, among others Bezakova
et al. [2006] and Dyer [2003]. Closely related is the problem
of generating contingency tables with fixed margins, which
has been studied in statistics (such as Chen et al. [2005]). In
general, a large body of research is devoted to randomization
methods [Good 2000]
V RANDOMIZATION TO PROTECT PRIVACY
Return x+ r instead of x, where r is a random value drawn
from a distribution. Uniform and Gaussian Reconstruction
algorithm knows parameters of r's distribution.
B. Classification Example

Decision-Tree Classification:

Assessing Data Mining Results via Swap Randomization

Swap randomization is an extension of traditional


randomization methods. For instance, a chi-square test for
assessing the significance of frequent item sets is a method
based on studying the distribution of datasets where the
column margins are fixed, but the row margins are allowed to
vary. Similarly, methods that randomize the target value in
prediction tasks keep the column margins fixed (e.g., Megiddo
and Srikant [1998]), but impose no constraint on the row
margins. These techniques are designed for assessing the
significance of individual patterns or models, and are not
appropriate for assessing complex results of data mining such
as clustering or pattern sets. Swap randomization preserves

Partition (Data S)
begin
if (most points in S belong to same class)
return;
for each attribute A
evaluate splits on attribute A;
Use best split to partition S into S1 and S2;
Partition (S1);
Partition (S2);
End
C. Training using Randomized Data
In this we need to modify two key operations .Determining
split point and partitioning data. When and how we should
reconstruct distribution is primary question. First solution is to
reconstruct using the whole data (globally) or reconstruct
separately for each class. Second solution is to reconstruct
once at the root node or at every node.

28
9

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

VI APPLICATIONS OF PRIVACY-PRESERVING DATA MINING

The problem of privacy-preserving data mining has


numerous applications in homeland security, medical database
mining, and customer transaction analysis. Some of these
applications such as those involving bio-terrorism and medical
database mining may intersect in scope. In this section, we
will discuss a number of different applications of privacypreserving data mining methods.
A. Medical Databases
The scrub system [9] was designed for de-identification of
clinical notes and letters which typically occurs in the form of
textual data. Clinical notes and letters are typically in the form
of text which contains references to patients, family members,
addresses, phone numbers or providers. Traditional techniques
simply use a global search and replace procedure in order to
provide privacy. However clinical notes often contain cryptic
references in the form of abbreviations which may only be
understood either by other providers or members of the same
institution. Therefore traditional methods can identify no more
than 30-60% of the identifying information in the data. The
Scrub system uses numerous detection algorithms which
compete in parallel to determine when a block of text
corresponds to a name, address or a phone number. The Scrub
System uses local knowledge sources which compete with one
another based on the certainty of their findings. It has been
shown in [9] that such a system is able to remove more than
99% of the identifying information from the data.
B. Bioterrorism Applications
In typical bioterrorism applications, we would like to analyze
medical data for privacy-preserving data mining purposes.
Often a biological agent such as anthrax produces symptoms
which are similar to other common respiratory diseases such
as the cough, cold and the flu. In the absence of prior
knowledge of such an attack, health cares providers may
diagnose a patient affected by an anthrax attack of have
symptoms from one of the more common respiratory diseases.
The key is to quickly identify a true anthrax attack from a
normal outbreak of a common respiratory disease, in many
cases; an unusual number of such cases in a given locality may
indicate a bio-terrorism attack. Therefore, in order to identify
such attacks it is necessary to track incidences of these
common diseases as well. Therefore, the corresponding data
would need to be reported to public health agencies. However,
the common respiratory diseases are not reportable diseases by
law. The solution proposed in [10] is that of selective
revelation which initially allows only limited access to the
data. However, in the event of suspicious activity, it allows a
drill-down into the underlying data. This provides more
identifiable information in accordance with public health law.

Credential Validation Problem, Identity Theft, Web Camera


Surveillance, Video-Surveillance, Watch List Problem
D. Genomic Privacy
Recent years have seen tremendous advances in the science of
DNA sequencing and forensic analysis with the use of DNA.
As result, the databases of collected DNA are growing very
fast in the both the medical and law enforcement communities.
DNA data is considered extremely sensitive, since it contains
almost uniquely identifying information about an individual
[12].
VI I CONCLUSION AND FUTURE WORK
In this paper, I have carried out a wide survey of the different
approaches for privacy preserving data mining, and analyses
the major algorithms available for randomization method and
points out the existing drawback. While all the purposed
methods are only approximate to our goal of privacy
preservation, we need to further perfect those approaches or
develop some efficient methods.
REFERENCES
[1] Han Jiawei, M. Kamber, Data Mining: Concepts and Techniques,
Beijing: China Machine Press,pp.1-40,2006.
[2] ]D. Agrawal and C. Aggarwal. On the design and quantification of
privacy preserving data mining algorithms. In Proceedings of the 20th
ACM SIGACT-SIGMOD-SIGART Symposium on Principles of
Database Systems, Santa Barbara, California, USA, May 21-23 2001.
[3] Agrawal R., Bayardo R., Faloutsos C., Kiernan J., Rantzau R., Srikant
R.:Auditing Compliance via a hippocratic database. VLDB Conference,
2004..
[4] G. Loukides, J.H. Shao, An Efficient Clustering Algorithm for kAnonymisation, International Journal of Computer Science And
Technology,vol.23, no.2, pp.188-202, 2008.
[5] R. Agrawal, R. Srikant, Privacy-Preserving Data Mining, ACM
SIGMOD Record, New York,vol.29, no.2, pp.439-450,2000.
[6] G Agrawal R., Srikant R., Thomas D. Privacy-Preserving OLAP.
Proceedings of the ACM SIGMOD Conference, 2005.
[7] A. Evfimievski, R. Srikant, R. Agrawal, J. Gehrke, Privacy Preserving
Mining of Association Rules, Information System, vol.29, no.4, pp.343364,2004.
[8] Fienberg S., McIntyre J.: Data Swapping: Variations on a Theme by
Dalenius and Reiss. Technical Report, National Institute of Statistical
Sciences, 2003.
[9] Sweeney L.: Replacing Personally Identifiable Information in Medical
Records, the Scrub System. Journal of the American Medical Informatics
Association, 1996.
[10] Sweeney L.: Privacy-Preserving Bio-terrorism Surveillance. AAAI
Spring Symposium, AI Technologies for Homeland Security, 2005.
[11] Sweeney L.: Privacy Technologies for Homeland Security. Testimony
before the Privacy and Integrity Advisory Committee of the Deprtment
of Homeland Security, Boston, MA, June 15, 2005.
[12] Malin B. Why methods for genomic data privacy fail and what we can
do to fix it, AAAS Annual Meeting, Seattle, WA, 2004.

C. Homeland Security Applications


A number of applications for homeland security are inherently
intrusive because of the very nature of surveillance. In [11], a
broad overview is provided on how privacy-preserving
techniques may be used in order to deploy these applications
effectively without violating user privacy. Some examples of
such applications are as follows:

29
10

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

A Review On Web Mining


Mr. Dushyant Rathod
Lecturer, Information Technology, Gandhinagar Institute of Technology,Gandhinagar,
dushyant.rathod@git.org.in

Abstract Data mining is one of the most applicable area of


research in computer applications among the various types of
data mining . This paper is going to focus on web mining. This is
the review paper which shows deep and intense study of various
techniques available for web minings. Web mining - i.e. the
application of data mining techniques to extract knowledge from
Web content, structure, and usage - is the collection of
technologies to fulfill this potential. Above definition of web
mining is explored in this paper.
Index TermsWeb Mining , Web Structure Mining, Web
Content Mining, Web Usage Mining.

I. INTRODUCTION

eb mining is the application of data mining techniques


to extract knowledge from Web data - including Web
documents, hyperlinks between documents, usage logs
of web sites, etc. Two different approaches were taken in
initially defining Web mining. First was a process-centric
view, which defined Web mining as a sequence of tasks.
Second was a data-centric view, which defined Web mining
in terms of the types of Web data that was being used in the
mining process . The second definition has become more
acceptable, as is evident from the approach adopted in most
recent papers that have addressed the issue. In this paper we
follow the data-centric view, and refine the definition of Web
mining as, Web mining is the application of data mining
techniques to extract knowledge from Web data, where at
least one of structure (hyperlink) or usage (Web log) data
is used in the mining process (with or without other types of
Web data)[1].
The attention paid to Web mining, in research, software
industry, and Web-based organizations, has led to the
accumulation of a lot of experiences. It is our attempt in this
paper to capture them in a systematic manner, and identify
directions for future research.[2]

II. WEB MINING


Web mining is the Data Mining technique that automatically
discovers or extracts the information from web documents. It
consists of following tasks[4]:

1. Resource finding: It involves the task of retrieving intended


web documents. It is the process by which we extract the data
either from online or offline text resources available on web.
2. Information selection and pre-processing: It involves the
automatic selection and pre processing of specific information
from retrieved web resources. This process transforms the
original retrieved data into information. The transformation
could be renewal of stop words, stemming or it may be aimed
for obtaining the desired representation such as finding phrases
in training corpus.
3. Generalization: It automatically discovers general patterns
at individual web sites as well as across multiple sites. Data
Mining techniques and machine learning are used in
generalization
4. Analysis: It involves the validation and interpretation of the
mined patterns. It plays an important role in pattern mining. A
human plays an important role in information on knowledge
discovery process on web[3].

III. WEB MINING TAXONOMY


Web Mining can be broadly divided into three distinct
categories, according to the kinds of data to be mined:
A. Web Content Mining
Web content mining is the process of extracting useful
information from the contents of web documents. Content
data is the collection of facts a web page is designed to
contain. It may consist of text, images, audio, video, or
structured records such as lists and tables. Application of text
mining to web content has been the most widely researched.
Issues addressed in text mining include topic discovery and
tracking, extracting association patterns, clustering of web
documents and classification of web pages. Research activities
on this topic have drawn heavily on techniques developed in
other disciplines such as Information Retrieval (IR) and
Natural Language Processing (NLP). While there exists a
significant body of work in extracting knowledge from images
in the fields of image processing and computer vision, the
application of these techniques to web content mining has been
limited.

21
11

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

B. Web Structure Mining


The structure of a typical web graph consists of web pages as
nodes, and hyperlinks as edges connecting related pages. Web
structure mining is the process of discovering structure
information from the web. This can be further divided into
two kinds based on the kind of structure information used.
Hyperlinks
A hyperlink is a structural unit that connects a location in a
web page to a different location, either within the same web
page or on a different web page. A hyperlink that connects to a
different part of the same page is called an intra-document
hyperlink, and a hyperlink that connects two different pages is
called an inter-document hyperlink. There has been a
significant body of work on hyperlink analysis, of which
Desikan, Srivastava, Kumar, and Tan (2002) provide an up-todate survey.
Document Structure
In addition, the content within a Web page can also be
organized in a treestructured format, based on the various
HTML and XML tags within the page. Mining efforts here
have focused on automatically extracting document object
model (DOM) structures out of documents (Wang and Liu
1998; Moh, Lim, and Ng 2000).

applications to be built on top of them with little effort. A key


feature is the ability to track various kinds of business events
and log them in application server logs.
Application Level Data
New kinds of events can be defined in an application, and
logging can be turned on for them generating histories of
these events. It must be noted, however, that many end
applications require a combination of one or more of the
techniques applied in the above the categories.
D. Text Mining
Due to the continuous growth of the volumes of text data,
automatic extraction of implicit previously unknown and
potentially useful information becomes more necessary to
properly utilize this vast source of knowledge. Text mining,
therefore, corresponds to extension of the data mining
approach to textual data and its concerned with various tasks,
such as extraction of information implicitly contained in
collection of documents or similarity- based structuring. Text
collection in general, lacks the imposed structure of a
traditional database. The text expresses the vast range of
information, but encodes the information in a form that is
difficult to decipher automatically[2].

TABLE: 1 Web Mining Categories


Fig. 1 . Web Mining Taxonomy
C. Web Usage Mining
Web usage mining is the application of data mining techniques
to discover interesting usage patterns from web usage data, in
order to understand and better serve the needs of web-based
applications (Srivastava, Cooley, Deshpande, and Tan 2000).
Usage data captures the identity or origin of web users along
with their browsing behavior at a web site. web usage mining
itself can be classified further depending on the kind of usage
data considered:
Web Server Data
User logs are collected by the web server and typically include
IP address, page reference and access time.
Application Server Data
Commercial application servers such as Weblogic,1,2
StoryServer,3 have significant features to enable E-commerce

IV. KEY CONCEPTS WITH ALGORITHMS


In this section we briefly describe the new concepts introduced
by the web mining research community .
A . Ranking Metricsfor Page Quality
Searching the web involves two main steps: Extracting the
pages relevant to a query and ranking them according to their
quality. Ranking is important as it helps the user look for
quality pages that are relevant to the query. Different metrics
have been proposed to rank web pages according to their
quality. We briefly discuss two of the prominent ones.
1. PageRank
PageRank is a metric for ranking hypertext documents based
on their quality. Page, Brin, Motwani, and Winograd (1998)
developed this metric for the popular search engine Google4
(Brin and Page 1998). The key idea is that a page has a high
rank if it is pointed to by many highly ranked pages. So, the
rank of a page depends upon the ranks of the pages pointing to

22
12

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

it. This process is done iteratively until the rank of all pages
are determined. The rank of a page p can be written as:

Here, n is the number of nodes in the graph and


OutDegree(q) is the number of hyperlinks on page q.
Intuitively, the approach can be viewed as a stochastic analysis
of a random walk on the web graph. The first term in the right
hand side of the equation is the probability that a random web
surfer arrives at a page p by typing the URL or from a
bookmark; or may have a particular page as his/her homepage.
Here d is the probability that the surfer chooses a URL
directly, rather than traversing a link5 and 1d is the
probability that a person arrives at a page by traversing a link.
The second term in the right hand side of the equation is the
probability of arriving at a page by traversing a link.
2. Weighted Page Rank
This algorithm was proposed by Wenpu Xing and Ali
Ghorbani which is an extension of PageRank algorithm[7].
This Algorithm assigns rank values to pages according to their
importance rather than dividing it evenly. The importance is
assigned in terms of weight values to incoming and outgoing
links. This is denoted as

and

respectively.

is the weight of link(m,n) as given in (1).. It is


calculated on the basis of number of incoming links to page n
and the number of incoming links to all reference pages of
page m.

(1)

In is number of incoming links of page n, Ip is number of


incoming links of page p, R(m) is the reference page list of
page m.
is the weight of link(m,n)as given in (2). It is
calculated on the basis of the number of outgoing links of page
n and the number of outgoing links of all the reference pages
of page m.

(2)
On is number of outgoing links of page n, Op is number of
outgoing links of page p, Then the weighted PageRank is
given by formula in (3)

2.1 PageRank VS Weighted PageRank


In order to compare the WPR with the PageRank, the resultant
pages of a query are categorized into four categories based on
their relevancy to the given query. They are
Very Relevant Pages (VR): These are the pages that
contain very important information related to a given
query.
Relevant Pages (R): These Pages are relevant but not
having important information about a given query.
Weakly Relevant Pages (WR): These Pages may
have the query keywords but they do not have the
relevant information
3. Hubs and Authorities
Hubs and authorities can be viewed as fans and centers in
a bipartite core of a web graph, where the fans represent the
hubs and the centers represent the authorities. The hub and
authority scores computed for each web page indicate the
extent to which the web page serves as a hub pointing to good
authority pages or as an authority on a topic pointed to by
good hubs. The scores are computed for a set of pages related
to a topic using an iterative procedure called HITS (Kleinberg
1999). First a query is submitted to a search engine and a set of
relevant documents is retrieved. This set, called the root set,
is then expanded by including web pages that point to those in
the root set and are pointed by those in the root set. This
new set is called the base set. An adjacency matrix, A is
formed such that if there exists at least one hyperlink from
page i to page j, then Ai,j = 1, otherwise Ai,j = 0. HITS
algorithm is then used to compute the hub and authority scores
for these set of pages.
There have been modifications and improvements to the basic
page rank and hubs and authorities approaches such as SALSA
(Lempel and Moran 2000), topic sensitive page rank,
(Haveliwala 2002) and web page reputations (Mendelzon and
Rafiei 2000). These different hyperlink based metrics have
been discussed by Desikan, Srivastava, Kumar, and Tan
(2002).
Klienberg gives two forms of web pages called as hubs and
authorities. Hubs are the pages that act as resource lists.
Authorities are pages having important contents. A good hub
page is a page which is pointing to many authoritative pages
on that content and a good authority page is a page which is
pointed by many good hub pages on the same content. A page
may be a good hub and a good authority at the same time.
The HITS algorithm treats WWW as directed graph G(V,E),
where V is a set of vertices representing pages and E is set of
edges corresponds to link. Figure 1 shows the hubs and
authorities in web [3].
It has two steps:
1. Sampling Step:- In this step a set of relevant pages for
the given query are collected.
2. Iterative Step:- In this step Hubs and Authorities are
found using the output of sampling step.

(3)

23
13

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

bandwidth at the expense of other users. Sessions due to web


robots also make it difficult to perform click-stream analysis
effectively on the web data. Conventional techniques for
detecting web robots are based on identifying the IP address
and user agent of the web clients. While these techniques are
applicable to many well-known robots, they are not sufficient
to detect camouflaged and previously unknown robots. Tan
and Kumar (2002) proposed a classification based approach
that uses the navigational patterns in click-stream data to
determine if it is due to a robot. Experimental results have
shown that highly accurate classification models can be built
using this approach. Furthermore, these models are able to
discover many camouflaged and previously unidentified
robots.[6]
Fig. 3 Hubs And Authorities
Following expressions (1,2)are used to calculate the weight of
Hub (Hp) and the weight of Authority (Ap).

here Hq is Hub Score of a page, Aq is authority score of a


page, I(p) is set of reference pages of page p and B(p) is set of
referrer pages of page p, the authority weight of a page is
proportional to the sum of hub weights of pages that link to it.
Similarly a hub of a page is proportional to the sum of
authority weights of pages that it links to.
Constraints with HITS algorithm
Following are some constraints of HITS algorithm[3]
Hubs and authorities: It is not easy to distinguish
between hubs and authorities because many sites are
hubs as well as authorities.
Topic drift: Sometime HITS may not produce the most
relevant documents to the user queries because of
equivalent weights.
Automatically generated links: HITS gives equal
importance for automatically generated links which
may not have relevant topics for the user query
Efficiency: HITS algorithm is not efficient in real time.[5]
HITS was used in a prototype search engine called Clever for
an IBM research project. Because of the above constraints
HITS could not be implemented in a real time search engine.
B. Robot Detection and FilteringSeparating Human
and NonhumanWeb Behavior
Web robots are software programs that automatically traverse
the hyperlink structure of the web to locate and retrieve
information. The importance of separating robot behavior from
human behavior prior to building user behavior models has
been illustrated by Kohavi (2001). First, e-commerce retailers
are particularly concerned about the unauthorized deployment
of robots for gathering business intelligence at their web sites.
Second, web robots tend to consume considerable network

C. User Profiles Understanding How Users Behave


The web has taken user profiling to new levels. For example,
in a brick-andmortar store, data collection happens only at
the checkout counter, usually called the point-of-sale. This
provides information only about the final outcome of a
complex human decision making process, with no direct
information about the process itself. In an on-line store, the
complete click-stream is recorded, which provides a detailed
record of every action taken by the user, providing a much
more detailed insight into the decision making process.
Adding such behavioral information to other kinds of
information about users, for example demographic,
psychographic, and so on, allows a comprehensive user profile
to be built, which can be used for many different purposes
(Masand, Spiliopoulou, Srivastava, and Zaiane 2002). While
most organizations build profiles of user behavior limited to
visits to their own sites, there are successful examples of
building web-wide behavioral profiles such as Alexa
Research6 and DoubleClick7. These approaches require
browser cookies of some sort, and can provide a fairly detailed
view of a users browsing behavior across the web.[8]
D. PreprocessingMakingWeb Data Suitable for Mining
In the panel discussion referred to earlier (Srivastava and
Mobasher 1997), preprocessing of web data to make it suitable
for mining was identified as one of the key issues for web
mining. A significant amount of work has been done in this
area for web usage data, including user identification and
session creation (Cooley, Mobasher, and Srivastava 1999),
robot detection and filtering (Tan and Kumar 2002), and
extracting usage path patterns (Spiliopoulou 1999). Cooleys
Ph.D. dissertation (Cooley 2000) provides a comprehensive
overview of the work in web usage data preprocessing.
Preprocessing of web structure data, especially link
information, has been carried out for some applications, the
most notable being Google style web search (Brin and Page
1998). An up-to-date survey of structure preprocessing is
provided by Desikan, Srivastava, Kumar, and Tan (2002).
E. Online Bibiliometrics
With the web having become the fastest growing and most
up to date source of information, the research community has
found it extremely useful to have online repositories of

24
14

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

publications. Lawrence observed (Lawrence 2001) that having


articles online makes them more easily accessible and hence
more often cited than articles that are offline. Such online
repositories not only keep the researchers updated on work
carried out at different centers, but also makes the interaction
and exchange of information much easier.
With such information stored in the web, it becomes easier
to point to the most frequent papers that are cited for a topic
and also related papers that have been published earlier or later
than a given paper. This helps in understanding the state of the
art in a particular field, helping researchers to explore new
areas. Fundamental web mining techniques are applied to
improve the search and categorization of research papers, and
citing related articles. Some of the prominent digital libraries
are Science Citation Index (SCI),8 the Association for
Computing Machinerys ACM portal,9, the Scientific
Literature Digital Library (CiteSeer),10 and the DBLP
Bibliography.
F. Visualization of the World WideWeb
Mining web data provides a lot of information, which can be
better understood with visualization tools. This makes oncepts
clearer than is possible with pure textual representation.Hence,
there is a need to develop tools that provide a graphical
interface that aids in visualizing results of web mining.
Analyzing the web log data with visualization tools has
evoked a lot of interest in the research community. Chi,
Pitkow, Mackinlay, Pirolli, Gossweiler, and Card (1998)
developed a web ecology and evolution visualization (WEEV)
tool to understand the relationship between web content, web
structure and web usage over a period of time. The site
hierarchy is represented in a circular form called the Disk
Tree and the evolution of the web is viewed as a Time
Tube. Cadez, Heckerman, Meek, Smyth, and White (2000)
present a tool called WebCANVAS that displays clusters of
users with similar navigation behavior. Prasetyo, Pramudiono,
Takahashi, Toyoda, and Kitsuregawa developed Naviz, an
interactive web log visualization tool that is designed to
display the user browsing pattern on the web site at a global
level, and then display each browsing path on the pattern
displayed earlier in an incremental manner. The support of
each traversal is represented by the thickness of the edge
between the pages. Such a tool is very useful in analyzing user
behavior and improving web sites.[7]

exploit and enable a more effective integration and mining of


content, usage, and structure data from different sources
promise to lead to the next generation of intelligent Web
applications.
REFERENCES
[1] Srivastava J, Desikan P and V Kumar , Web MiningConcepts,Applications & Research Direction in 2002 Conference
[2] Srivastava J, Desikan P and V Kumar , Web Mining- Accomplishment
& Future Directuins in 2004 Conference
[3] Rekha Jain and Dr G. N Purohit,Page Ranking Algorithms for Web
Mining International Journal of Computer Applications (0975 8887
Volume 13 No.5, January 2011
[4]

Srivastava, J., Cooley, R., Deshpande, M., And Tan, P-N. (2000). Web
usage mining: Discovery and applications of usage patterns from web
data, SIGKDD Explorations, 1(2), 12-23.H. Poor, An Introduction to
Signal Detection and Estimation. New York: Springer-Verlag, 1985,
ch.4.

[5]

Maier T. (2004). A Formal Model of the ETL Process for OLAP-Based


Web Usage Analysis. In Proc. of WebKDD- 2004 workshop on Web
Mining and WebUsage Analysis, part of the ACM KDD: Knowledge
Discovery and Data Mining

[6] Meo R., Lanzi P., Matera M., Esposito R. (2004). Integrating Web
Conceptual Modeling and Web Usage Mining. In Proc. of Web KDD2004 workshop on Web Mining and Web Usage Analysis, part of the
ACM KDD: Knowledge Discovery and Data Mining Conference,
Seattle, WA.
[7] Desikan P. and Srivastava J. (2004), Mining Temporally Evolving
Graphs. In Proceedings of Web KDD- 2004 workshop on Web Mining
and Web Usage Analysis, B. Mobasher, B. Liu, B. Masand, O.
Nasraoui, Eds. part of the ACM KDD: Knowledge Discovery and Data
Mining Conference, Seattle, WA.
[8] Berendt B., Bamshad M, Spiliopoulou M., and Wiltshire J. (2001).
Measuring the accuracy of sessionizers for web usage analysis, In
Workshop on Web Mining, at the First SIAM International Conference
on Data Mining, 7-14.
[9] Srivastava, J., Cooley, R., Deshpande, M., And Tan, P-N. (2000). Web
usage mining: Discovery and applications of usage patterns from web
data, SIGKDD Explorations, 1(2), 12-23.
[10] J. Hou and Y. Zhang, Effectively Finding Relevant Web Pages from
Linkage Information, IEEE Transactions on Knowledge and Data
Engineering, Vol. 15, No. 4, 2003.
[11] R. Kosala, and H. Blockeel, Web Mining Research: A Survey,
SIGKDD Explorations, Newsletter of the ACM Special Interest Group
on Knowledge Discovery and Data Mining Vol. 2, No. 1 pp 1-15, 2000.

V. CONCLUSION
In this article, we have outlined three different modes of web
mining, namely web content mining, web structure mining and
web usage mining. Needless to say, these three approaches can
not be independent, and any efficient mining of the web would
require a judicious combination of information from all the
three sources. We have presented in this paper the significance
of introducing the web mining techniques. The development
and application of Web mining techniques in the context of
Web content, usage, and structure data will lead to tangible
improvements in many Web applications, from search engines
and Web agents to Web analytics and personalization. Future
efforts, investigating architectures and algorithms that can

25
15

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

A Survey Paper on HyperlinkInduced Topic Search (HITS)


Algorithms for Web Mining
Mr.Ramesh Prajapati
Lecturer, Information Technology, Gandhinagar Institute of Technology, Gandhinagar
ramesh.prajapati@git.org.in

Abstract--In this Paper Hyperlink-Induced Topic Search (HITS)


is a link algorithm for web mining which helps in rating Web
pages also known as Hubs and authorities fast and efficient HITS
are query dependent and calculated at the time of the search.
HITS mechanism for web crawling and retrieval remains as a
challenging issue. This paper deals with Survey and comparison
of web page ranking algorithms based on various parameter to
find out their advantages and limitations for the ranking of the
web pages using Web mining. Web mining technique is used to
categorize users and pages by analyzing users behavior, the
content of pages and order of URLs accessed. In this paper we
discuss and compare the different used algorithms i.e.Page Rank
and HITS.
Index TermsHub, HITS, Networking, Page Rank, Web Mining,
Web Usage Mining, Web Content Mining, Web Structure
Mining, Web Usage Mining,Weighted Page Rank, HITS

I. INTRODUCTION
As the volume of information on the internet is increasing Day
by day so there is a challenge for website owner to Provide
proper and relevant information to the internet user. Retrieving
of the required web page on the web, efficiently and
effectively, is becoming a challenge. Whenever a user wants to
search the relevant pages, he/she prefers those relevant pages
to be at hand. The bulk amount of information becomes very
difficult for the users to find, extract, filter or evaluate the
relevant information. This issue raises the necessity of some
technique that can solve these challenges. Web mining can be
easily executed with the help of other areas like Database
(DB), Information retrieval (IR), Natural Language Processing
(NLP), and Machine Learning etc. These can be used to
discuss and analyze the useful information from WWW.
Following are some challenges:
1) Web is huge. 2) Web pages are semi structured. 3) Web
information stands to be diversity in meaning. 4) Degree of
quality of the information extracted. 5) Conclusion of
knowledge from information extracted.

This paper is organized as follows- Web Mining is introduced


in Section II. The areas of Web mining i.e. Web Content

Mining, Web Structure Mining and Web Usage Mining are


discussed in Section III.Section IV describes the Scalefree
network model. In Section V describes the various Links based
Analysis algorithms. Page Rank algorithm and its Limitation
of PageRank are presented in Section A.In Section B includes
Weighted PageRank algorithms. In Section C HITS, Hub and
Authorities and Motivation behind HITS.HITS Algorithm and
Handling spam links In Section D.Handling Span links in
Section E. Section VI Based on the literature analysis provides
the comparison of HITS vs. PageRank algorithms. Concluding
remarks are given in Section VII.
II. WEB MINING
Web mining is the Data Mining technique that automatically
discovers or extracts the information from web documents. It
is the extraction of interesting and potentially useful patterns
and implicit information from activity related to the World.
A. Web Mining Process
The complete process of extracting knowledge from
Web data [2] is follows in Fig.1:

Figure 1: Web Mining Process


It consists of following tasks [4]:
1. Resource finding: It involves the task of retrieving Intended
web documents.
2. Information selection and pre-processing: It Involves the
automatic selection and pre processing of specific information
from retrieved web Resources.
3. Generalization: It automatically discovers general Patterns
at individual web sites as well as across multiple sites.

13
16

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

4. Analysis: It involves the validation and interpretation of the


mined patterns A human plays an important role in information
on knowledge discovery process on web.

and client applications can quite easily capture data about Web
usage.

III.WEB MINING CATEGORIES


There are three areas of Web Mining according to the web
data used as input in Web Data Mining. Web Content Mining,
Web Structure Mining and Web Usage Mining.

IV SCALE-FREE NETWORK MODEL

A. Web Content Mining


It is the process of retrieving the information from Web
document into more structured forms and indexing the
information to retrieve it quickly. It focuses mainly on the
structure within a document i.e. inner document level. Web
Content Mining is related to Data Mining because many Data
Mining techniques can be applied in Web Content Mining. It
is also related with text mining because much of the web
contents are text, but is also quite different from these because
web data is mainly semi structured in nature and text mining
focuses on unstructured text.

A simple model for generating scalefree networks in


following point.
1. Evolution: networks expand continuously by the addition of
new vertices, and 2. Preferentialattachment (rich get richer):
new vertices attach preferentially to sites that are already well
connected. Growing the network (evolution): Starting with a
small number (m0) of vertices, at every time step we add a new
vertex with m (m0) edges that link the new vertex to m
different vertices already present in the system.
Growing the network (preferential attachment): To incorporate
Preferential attachment, we assume that the probability P that a
new vertex will be connected to vertex i depends on the
Connectivity k i of that vertex, so that P(k i ) = k i / k j j .

Figure 3: Scalefree network model


B. Web Structure Mining
It is the process by which we discover the model of link
structure of the web pages. We catalog the links; generate the
information such as the similarity and relations among them by
taking the advantage of hyperlink topology. The goal of Web
Structure Mining is to generate structured summary about the
website and web page. Page Rank and hyperlink analysis also
fall in this category. It tries to discover the link structure of
hyper links at inter document level. As it is very common that
the web documents contain links and they use both the real or
primary data on the web so it can be concluded that Web
Structure Mining has a relation with Web Content Mining. It is
using the tree-like structure to analyze and describe the HTML
(Hyper Text Markup Language).
C. Web Usage Mining
It is the process by which we identify the browsing patterns by
analyzing the navigational behavior of user. It focuses on
techniques that can be used to predict the user behavior while
the user interacts with the web. It uses the secondary data on
the web. This activity involves the automatic discovery of user
access patterns from one or more web servers. Through this
mining technique we can ascertain what users are looking for
on Internet. It consists of three phases, namely preprocessing,
pattern discovery, and pattern analysis. Web servers, proxies,

V. LINK BASED ANALYSIS


Web mining technique provides the additional information
through hyperlinks where different documents are connected.
We can view the web as a directed labeled graph whose nodes
are the documents or pages and edges are the hyperlinks
between them. This directed graph structure is known as web
graph. There are number of algorithms proposed based on link
analysis. Three important algorithms Page Rank,Weighted
PageRank and HITS are discussed below.
A. Page Rank Algorithm
Page Rank is a numeric value that represents how important a
page is on the web. Page Rank is the Googles method of
measuring a page's "importance." When all other factors such
as Title tag and keywords are taken into account, Google uses
Page Rank to adjust results so that more "important" pages
move up in the results page of a user's search result display.
Google Fig.s that when a page links to another page, it is
effectively casting a vote for the other page. Google calculates
a page's importance from the votes cast. for it.Its provides a
better approach that can compute the importance of web page
by simply counting the number of pages that are linking to it.
These links are called as backlinks.If a backlink comes from
an important page than this link is given higher weightage than

14
17

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

those which are coming from non-important pages. The link


from one page to another is considered as a vote. Not only the
number of votes that a page receives is important but the
importance of pages that casts The algorithm of Page Rank as
follows:
Page Rank takes the back links into account and propagates
the ranking through links. A page has a higher rank, if the sum
of the ranks of its backlinks is high. Fig. 3 shows an example
of back links wherein page A is a backlink of page B and page
C while page B and page C are backlinks of page D.The
original Page Rank algorithm is given in following equation
PR(P)=(1-d)+d(PR(T1)/C(T1)+..PR(Tn)/C(Tn)) .... (1)

Figure 4: Backlinks Page Rank

Where, PR (P)= PageRank of page P


PR (Ti) = PageRank of page Ti which link to page
C (Ti) =Number of outbound links on page T
D = Damping factor which can be set between 0 and 1.
A link from A to B is a vote for B cast by A. Votes cast by
pages that are important weigh more heavily.But there are
different types of important nodes:

Figure 5: Top automobile markers


Problem to be solved relevant terms may not appear on the
pages of authoritative websites. Many prominent pages are not
self descriptive. Car manufacturers may not use the term
automobile manufacturers on their home page. The term
search engine is not used by any of natural authorities like
Yahoo, Google, and AltaVista etc.
B. Weighted Page Rank

Extended Page Rank algorithm- Weighted Page Rank assigns


large rank value to more important pages instead of dividing
the rank value of a page evenly among its outlink pages. The
importance is assigned in terms of weight values to incoming
and outgoing links denoted as and respectively. This algorithm
was proposed by Wenpu Xing and Ali Ghorbani which is an
extension of PageRank algorithm.This Algorithm assigns rank
values to pages according to their importance rather than
dividing it evenly. The importance is assigned in terms of
weight values to incoming and outgoing links.
This is denoted as Win (m, n) and Wout(m,n)) respectively.
Win(m, n ) is the weight of link(m,n) as given in . It is
calculated on the basis of number of incoming links to page n
and the number of incoming links to all reference pages of
page m.

In is number of incoming links of page n, Ip is number of


incoming links of page p, R(m) is the reference page list of
page m.Wout(m,n) is the weight of link(m,n)as given in (3). It is
calculated on the basis of the number of outgoing links of page
n and the number of utgoing links of all the reference pages of
page m.

On is number of outgoing links of page n, Op is number of


outgoing links of page p,Then the weighted PageRank is given
by formula in (4)

C. HITS (Hyper-link Induced Topic Search)


Hyperlink-Induced Topic Search (HITS) (also known as Hubs
and authorities) is a link analysis algorithm that rates Web
pages, developed by Jon Kleinberg. It was a precursor to
PageRank. The idea behind Hubs and Authorities stemmed
from a particular insight into the creation of web pages when
the Internet was originally forming; that is, certain web pages,
known as hubs, served as large directories that were not
actually authoritative in the information that it held, but were
used as compilations of a broad catalog of information that led
users directly to other authoritative pages. In other words, a
good hub represented a page that pointed to many other pages,
and a good authority represented a page that was linked by
many different hubs. The scheme therefore assigns two scores
for each page: its authority, which estimates the value of the
content of the page, and its hub value, which estimates the
value of its links to other pages. A page may be a good hub
and a good authority at the same time. The HITS algorithm

15
18

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

treats WWW as directed graph G(V,E),where V is a set of


vertices representing pages and E is set of edges corresponds
to link. Attempts to computationally determine hubs and
authorities on a particular topic through analysis of a relevant
sub graph of the web.

What is the problem here? Some links are just navigational


Click here to return to the main menu Some links are
advertisements Difficulty in finding balance between relevance
and popularity Solution: Based on relationship between the
authorities for a topic and those pages that link to many related
authorities-hubs.

Figure 8: Relationship between Authority and Hub

Figure 6: Hyperlinked Pages Modeled as Directed Graph


Based on mutually recursive facts: Hubs point to lots of
authorities. Authorities are pointed to by lots of hubs.
Authority: A valuable and informative webpage usually
pointed to by a large number of hyperlinks Hub: A webpage
that points to many authority pages is itself a resource and is
called a hub Authorities and hubs reinforce one another. A
good authority is pointed to by many good hubs. A good hub
points to many good authorities

Figure 7: Hub and Authority


Motivation behind HITS:
The creator of page p, by including a link to page q, has in
some measure conferred authority on q Links afford us the
opportunity to find potential authorities purely through the
pages that point to them

Figure 9: Relationship between Hub and Authority


D. HITS Algorithm
Computes hubs and authorities for a particular topic specified
by a normal query. First determines a set of relevant pages for
the query called the base set S.Analyze the link structure of the
web subgraph defined by S to find authority and hub pages in
this set. Following points construction of focused sub graph.
We have a set created by textbased search engine.
Why do we need subset?
The set may contain too many pages and entail a
Considerable computational cost
Most of the best authorities may not belong to this set
Subset properties:
Relatively small
Rich in relevant pages
Contains most (or many) of the strongest authorities

16
19

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

Iterative Algorithm
Each page p is assigned two nonnegative weights, an
authority weight a and a hub weight h.Update the weights of a
and h

Figure 10: Computes hubs and authorities for a particular


topic specified by a normal query
First find a set of relevant pages
For a specific query Q, let the set of documents returned
by a standard search engine be called the root set R.
Initialize S to R.
Add to S all pages pointed to by any page in R.
Add to S all pages that point to any page in R.

Figure 11: Determines a set of relevant pages for the


query
The Subgraph reduction Offset the effect of links that serve
purely a navigational function Remove all intrinsic edges from
the graph,keeping only the edges corresponding to transverse
link. Remove links that are mentioned in more than m pages
(m=48).
Calculating the Hub and Authority Weights
A is the adjacency matrix of graph G= (V,E)

These operations add the weights of hubs into the authority


weight and add the authority weights into the hub weight
respectively Alternating these two operations will eventually
result in an equilibrium value, or weight, for each page.
G: a collection of n linked pages
Set a0 = [1/n, , 1/n]T
Set h0 = [1/n, , 1/n]T
For t=1,2, ,k

End
Based on the Survey of this HITS algorithm the overall graph
of with Authority and Hubness represented as following. HITS
algorithm discovered, they share similar roles in terms of their
email communication pattern in the data set. Our algorithm
discovers this structure as well. The estimated rankings are so
close to the actual ones that it is difficult to distinguish them.

Properties of HITS algorithm.


Several interesting results follow directly
(1) Webpage ordering. The authority ranking is, on average,
identical to the ranking according to webpage in degrees. To
see this, we have the following:
Elements of the principal eigenvector u1 are none increasing,
Assuming webpages are indexed such that there in degrees are
in no increasing order. We have, for any i < j,

17
20

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

From this, we conclude that to the extent that the fixed degree
sequence random graph approximate the web, ranking web
pages by their authority scores is the same as ranking by their
in degrees. Analogous results hold for hub ranking. These
indicate that the duality relationship embedded in mutual
reinforcement between hubs and authorities are manifested by
their in degree and out degrees.
(2) Uniqueness. If d1 is larger than d2, then the principal
eigenvector of LTL is unique, and is quite different from the
second principal eigenvector.
(3) Convergence. The convergence for HITS can be rather
fast: (1) the starting vector x(0) = (1,---, 1)T has large overlap
with principal eigenvector u1, but little overlap with other
principal eigenvectors uk; k = 2; ---,m, because uk contains
negative nodal values (2) In the iterations to compute u1, the
convergence rate depends on Y2/Y1 ~ h1/h2~ d1/d2 ' (1/2)2 =
1/4; using and the fact that in degrees follow power-law
distribution [10]: di * 1=i2. Thus the iteration converges
rapidly. Typically 5-10 iterations are sufficient.
(4) Web communities. HITS algorithm has been used to
identify multiple web communities using different
eigenvectors [22, 16]. The principal eigenvector defines a
Dominant web community. Each of other principal eigenvector
uk defines two communities, one with non-negative values
{i|uk(i) > 0}and the other with negative values {i|uk(i) <
0}.From the pattern of eigenvectors in our solutions, the
positive region of different eigenvectors overlap substantially.
Thus the communities of positives regions nest with each
other; so do communities of negative regions. Therefore, we
believe this method to identify multiple communities is less
effective. This difficulty is also noticed in practical
applications .A number of web community discovery
algorithms are being developed, e.g., trawling to find bipartite
cores network maximum flow and graph-clustering. One
advantage of these methods is that weak communities (topics)
can be separated from dominant communities and thus
identified. Without explicit community discovery, web pages
of weak topics are typically ranked low by HITS (and by in
degree ranking) and are often missed.

Figure12: Comparison of Hubs and Authority Scores


Evaluation of Web Search Results
Quality of Web search results is definitely a subjective matter.
Importance and usefulness of pages will vary from person to
person. But, can one objectively infer about the quality of a
page? Based on link structure of WWW how can we define a
Good page? Will it be query dependent? Or can it be query
independent? Current Web search systems respond to user
queries within a fraction of a second. Users will not mind
having a Web search system that responds within a few
seconds, provided it returns considerably better results. But as
stated by Kleinberg in
"We are lacking objective functions that are both concretely
defined and correspond to human notions of quality."
We describe below several parameters to objectively evaluate
a page.
1 Popularity
Popularity of a node can be equated with number in links it
has. Here we assume that, if many nodes point to a node then it
should be a popular node.
2 Centrality
Distance from node u to v can be defined as minimum number
of links via which we can reach v from u. Radius of a node is
its maximum distance from any node in the graph. Center of
the graph is the node with the smallest radius. The more
central the node, the more easily we can reach other parts of
the graph from it.
3 Prestiges
Prestige of a node can be recursively defined as the sum of the
prestiges of nodes pointing to it. Here we consider not just
number of in links but also quality of those in links. This is the
motivation behind Page Rank.
4 Informativeness
A node is informative if it points to several nodes that contain
useful information. Here we consider not just number of out
links, but also quality of nodes pointed.

5 Authority

18
21

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

Authority of a node is similar to the prestige of the node with


the difference that authority is measured with respect to some
focused tiny sub graph on a particular topic.
Selective Expansion of Root Set
Consider the step of expanding the root set. Generally root set
is of the order of a few hundred pages. Although existing
search systems return thousands of results for broad queries,
only top few are directly relevant and important for the topic
of the query. After adding all pages in one link neighborhood,
the size of the base set becomes of the order of a few thousand
pages. Most of the pages added are either useless or including
them in the base set causes topic drift.
We attack these problems by selectively expanding the root
set. So instead of expanding all the pages, we expand selective
pages only. Further we are also selective in adding in links or
out links of selected pages.

pages with high Page Rank receives a high rank itself. If there
are no links to a web page there is no support for that page.

FIGURE14: WEB GRAPH STRUCTURE


VI. COMPARISON
Table1 shows the difference between above two algorithms:
Table 1: Comparison of Page Rank and HITS

Figure13: Pages causing topic drift and topic Contamination


We first build page to host connectivity matrix of the root set.
Then we carry out power iteration computation using that
matrix and calculate hub and authority values as mentioned in
Upper Section. Then we select the top few hubs and authorities
from root set. Thus we have picked up candidate pages for
expansion from the root set. Now simply adding all pages in
one link neighborhood of these selected pages
Can again cause the same problems as that of simple
expansion used in HITS. So we consider the following factors
while adding pages to the root set.
As per definition hubs should point to good authorities.
So pages pointed to by top hubs in the root set can be good
authority pages. So pages pointed to by top hubs are added to
root set.
As per definition authorities are pointed to by good hubs.
So pages pointing to top authorities in the root set can be good
hubs. So these pages are added to the root set.
The Web can be viewed as a directed graph whose nodes are
the documents and the edges are the hyperlinks between them,
as shown in below figure. The graph structure of the World
Wide Web can be used for analysis to improve the retrieval
performance and classification accuracy the rank value
indicates an importance of a particular page. A hyperlink to a
page counts as a vote of support. The HITS of a page is
defined and depends on the number and Page Rank metric of
all pages that link to it (i.e.). A page that is linked to by many

19
22

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

[5] Wang jicheng, Huang Yuan,Wu Gangshan, Zhang Fuyan, "Web


mining: Knowledge discovery on the Web Systems", Man and
Cybernetics 1999 IEEE SMC 99 conference Proceedings. 1999
IEEE International conference
[6] Raymond Kosala, Hendrik Blockee, "Web Mining Research : A
Survey", ACM Sigkdd Explorations Newsletter, June 2000,
Volume 2.
[7] Taher H. Haveliwala, "Topic-Sensitive Page Rank: A ContextSensitive Ranking Algorithms for Web Search", IEEE transactions
on Knowledge and Data Engineering Vol.15, No 4 July/August
2003.

VII. CONCLUSION
Web Mining is powerful technique used to extract the
Information from past behavior of users. Selective expansion
Of root set and a different way of calculating hub and authority
values. As a result we had a very small base set and we were
able to distill results only for one topic even if a query was
ambiguous. Various Algorithms are used in Web Mining to
rank the relevant pages. The main focus of web structure
mining is on link information. Web usage mining focuses on
understanding user behavior as depicted in the web access logs
while interacting with a website. PageRank, Weighted
PageRank and HITS treat all links equally when distributing
the rank score. In the Problem of page rank and weight page
algorithm relevant terms may not appear on the pages of
authoritative websites. Many prominent pages are not self
descriptive. In HITS algorithm all links should be equally
treated so we considerations two problem. Some links may be
more meaningful than other links.Further.we also observed
that selective expansion of the root set is also rich in quality, as
many pages from the expanded root set topped the hub and
authority list. For the future works, there are still many issues
that need to be explored With the HITS algorithm, Being
HITS algorithms are not good enough to be applied in mining
the informative structures, the phenomenon that authorities
converge into densely linked irrelevant pages is called topic
drift problem. This problem is notorious in the area of
Information Retrieval. To address this problem, we propose
some other types of link analysis- based modification.
REFERENCES
[1] Rekha Jain, Dr G.N.Purohit, Page Ranking Algorithms for Web
Mining, International Journal of Computer application,Vol 13,
Jan 2011.
[2] Cooley, R, Mobasher, B., Srivastava, J."Web Mining: Information
and pattern discovery on the World Wide Web. In proceedings of
the 9th IEEE International Conference on tools with Artificial
Intelligence (ICTAI 97).Newposrt Beach,CA 1997.
[3] Pooja Sharma, Pawan Bhadana, Weighted Page Content Rank For
Ordering Web Search Result, International Journal of
Engineering Science and Technology, Vol 2, 2010.
[4] R. Kosala, H. Blockeel Web mining research A survey. ACM
Sigkdd Explorations,2(1):1-15, 2000.

[8] J. Hou and Y. Zhang, Effectively Finding Relevant Web Pages from
Linkage Information, IEEE Transactions on Knowledge and Data
Engineering, Vol. 15, No. 4, 2003.
[9] P Ravi Kumar, and Singh Ashutosh kumar, Web Structure Mining
Exploring Hyperlinks and Algorithms for Information Retrieval,
American Journal of applied sciences, 7 (6) 840-845 2010.
[10] M.G. da Gomes Jr. and Z. Gong, Web Structure Mining: An
Introduction, Proceedings of the IEEE International Conference on
Information Acquisition, 2005.
[11] R. Kosala, and H. Blockeel, Web Mining Research: A Survey,
SIGKDD Explorations, Newsletter of the ACM Special Interest
Group on Knowledge Discovery and Data Mining Vol. 2, No. 1 pp
1-15, 2000.
[12] HITS Algorithm - Hubs and Authorities on the Internet",
Available:http://www.math.cornell.edu/~mec/Winter2009/Raluca
Remus/Lecture4/lecture4.html
[13] HITS ", Available: http://en.wikipedia.org/wiki/PageRank.
[14] R. Weiss, B. Velez, M. Sheldon, C. Nemprempre, P. Szilagyi, D.K.
Gifford,, HyPursuit: A Hierarchical Network Search Engine that
Exploits Content-Link Hypertext Clustering," Proceedings of the
Seventh ACM Conference on Hypertext, 1996.
[15] M.R. Hen zinger. Hyperlink analysis for the web. IEEE Internet
Computing, 5:45{50, 2001.
[16] M. Kessler. Bibliographic coupling between scientific papers.
American documentation, 14:10-25, 1963.
[17] J. M. Kleinberg. Authoritative sources in a hyperlinked
Environment. J. ACM, 48:604-632, 1999
[18] R. Lempel and S. Moran. SALSA: stochastic approach for linkStructure analysis and the TKC effect. ACM Trans.
Information Systems, 19:131-160, 2001.
[19] S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, S.
Rajagopalan,Au-tomatic Resource Compilation by Analyzing
Hyperlink Structure and Associated Text," Proc. 7th International
World Wide Web Conference, 1998.
[20] D. Gibson, J. Kleinberg, and P. Raghavan. Inferring Web
Communities from link topology. In Proc. 9th ACM Conference
On Hypertext and Hypermedia (HyperText 98), pages
225234, Pittsburgh PA, June 1998.

20
23

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

An Efficient CT Image Reconstruction with Parallel


Modeling for Superior Quantitative Measures
S.Asif Hussain1
Associate professor
Department of ECE, A.I.T.S,
Rajampet, Andhra Pradesh, India
e-mail: sah.ssk@gmail.com

ABSTRACT
Image segmentation algorithms based on ROI
typically rely on the homogeneity of image intensities. CT
scanner is dedicated as research Scanner which has been
developed in view of imaging applications. A key Feature of
the work is to use Empirical system to achieve resolution
recovery with novel region based method. This method
identifies local intensity cluster with local clustering criterion
function with respect to neighborhood center. Reconstruction
quality is analyzed quantitatively in terms bias field correction
for intensity inhomogenity correction. This method is valid on
synthetic images of various imaging modalities. A significant
improvement in reconstruction quality can be realized by
faster and more accurate visual quality quantitative measures
where Reconstruction quality is analyzed quantitatively in
terms of bias-variance measures (bar phantom) and mean
square error (lesion phantom). However, with the inclusion of
the empirical kernel, the iterative algorithms provide superior
reconstructions compared to FBP, both in terms of visual
quality and quantitative measures. Simulated results show
improved tumor bias and variance characteristics with the
proposed algorithm.
Keywords: Intensity inhomogeneities, Empirical system
kernel, Bias-variance, Iterative algorithms

1. INTRODUCTION
Image segmentation is often an essential step before further
image-processing of three-dimensional medical images canbe
done. An object can be segmented based on shape and/or
intensity characteristics. The task of image segmentation can
be simplified with initialized parameters to guide accurate
segmentation. Semi-automated and interactive methods [1]
have been relatively successful, but require varying degrees of
human input. Segmentation is often an important step in US
B-mode image analysis .we consider the problem of
correcting for attenuation-related intensity inhomogenieties
i.e., those that cause a slowly changing (low-frequency)
intensity contrast and are not due to speckle-mode imaging
artifacts include speckle noise, attenuation(absorption and
scattering), etc. The statistical analysis and reduction of
speckle noise has been studied extensively in the literature
[1][7]. Other artifacts, particularly those caused by non
uniform beam attenuation within the body that are not
accounted for by time gain compensation (TGC), also
decrease the image signal-to-noise ratio (SNR).Existing level
set methods for image segmentation can be categorized into
two major classes: region-based models [4], [10], and edgebased models [3], [7], [8], [12]. Region-based models aim to
identify each region of interest by using a certain region
descriptor to guide the motion of the active contour. However,
it is very difficult to define a region descriptor for images with

K. Lokeswara Reddy2
M.Tech (E.S) student
Department of ECE,
A.I.T.S Rajampet, A.P, India
e -mail: lokes9@gmail.com

intensity inhomogeneities. Most of region-based models [4],


[16] are based on the assumption of intensity homogeneity.
Edge-based models use edge information for image
segmentation. These models do not assume homogeneity of
image intensities, and thus can be applied to images with
intensity inhomogeneities. A novel region-based method for
image segmentation. From a generally accepted model of
images with intensity inhomogeneities, we derive a local
intensity clustering property, and therefore define a local
clustering criterion function for the intensities in a
neighbourhood of each point. This local clustering criterion is
integrated over the neighbourhood center to define an energy
functional, which is converted to a level set formulation.
Minimization of this energy is achieved by an interleaved
process of level set evolution and estimation of the bias field.
As an important application, our method can be used for
segmentation and bias correction of magnetic resonance(MR)
images.

2. Methods:
BACKGROUND:
In this section, we review the method proposed by Zhang for
estimating the field
distortion and simultaneously
segmenting an MR image and provide implementation details
on how it has been adapted to work with US images. This
method essentially estimates the low (spatial)-frequency
multiplicative degradation field while at the same time
identifying regions of similar intensity inhomogeneity using
an MRF-MAP frame work. As we will explain in Section III,
although developed for another imaging modality, under
simplified assumptions, we can justify using the same
approach on displayed US images.

2.1 Model Specification:


Let S be a lattice indexing the pixels in the given image.
Further,
let and be the observed and the ideal (that is, without intensity
inhomogeneity distortion) intensities of the given image
respectively, being the number of pixels in the image. We
assume that the distortion at pixel
can be
expressed by a multiplicative model of the form

Where represents the gain of the intensity due to the


intensity inhomogeneity at pixel
. A logarithmic
transformation of this equation yields an addition. Let
and denote, respectively ,the observed and the ideal logtransformed intensities, then

241

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

Where
denotes the log-transformed intensity distortion
field. Segmentation can be considered as a problem of
statistical classification, which is to assign every pixel a class
label from a label set. Let denote the label set. A labeling
of will be denoted by
in which
is the
corresponding class label of pixel Given the class label , it
is assumed that the intensity value at pixel
follows a
Gaussian distribution(this assumption will be justified in
Section III) with parameter
being
the mean and the variance of class
respectively

, with I= (1,1,,1 .
Here,

(11)

is the posterior probability that pixel belongs to

class given the distortion field estimate, is a low-pass


filter, is the mean residual in which for pixel

(12)
And is the mean inverse covariance, in which if
otherwise.
(13)

where

With the distortion field taken into account, the above


distribution can be written in terms of the observed intensity
as
(4)
and, hence, a class-independent intensity distribution
(5)
Thus, the intensity distribution at pixel is modeled as a
Gaussian mixture, given the distortion field. Assuming that
the pixel intensities are statistically independent, the
probability density for the entire image, given the distortion
field, is

Bayes rule can be used to obtain the posterior probability of


the distortion field, given the observed intensity values
(7)
Where
is a normalization constant. The prior probability
density of the distortion field
is modeled as a Gaussian
with zero mean to capture its smoothness property. The
maximum a posteriori (MAP) principle can be employed to
obtain the optimal estimate of the distortion field
, given
the observed intensity values
(8)

The optimum solution satisfies the following condition


.

It is assumed that interpolated boundaries will partially


overlap with the true edges found using edge detection. The
probability of edges overlapping with shape-interpolated
boundaries may be modeled using Bayes probability. It is
assumed that the probability of overlap at interpolated slices is
greater than or equal to that at user-initialized contours. Edges
are divided into edge components based on their connectivity.
To retain edges with higher saliency, the edge components are
sorted in descending order relative to the amount of overlap
with the boundary. When the cumulative probability of
overlap exceeds that obtained from user initialized contours,
the remaining edge components are discarded. The Bayes
classification is thus not employed for training, but rather as a
guide to how well boundaries can be defined based on edge
detection.

3. Proposed Algorithms:

(6)

2.2 Bayesian criterion for filtering edge


information:

(9)

Solving this equation leads to the update equations (see [12]


for detail)

3.1 The 3-D Case:


The algorithm can be applied to 3-D volumes reconstructed
from a sequence of parallel, closely spaced 2-D images. We
assume that in such a sequence neighbouring slices resemble
each other, that is, overlapping pixels in neighbouring slices
tend to have the same class labels. Intensity in homogeneity
field estimation is performed within each 2-D image, while
the energy function in the MRF prior model involves a 3-D
neighbourhood system, which includes, for each pixel in a 2D scan, the eight nearest neighbours in the same scan and the
two direct neighbours in the previous and the next scan. This
3-D constraint helps to strengthen ambiguous boundaries that
are easily mislocated in2-D processing.

3.2 Boundary-edge correspondence:


Ideally, the match between boundary and edge should be oneto-one. However, deviations in the interpolated shape will not
initialize Bi well. To prevent many-to-one snapping of
boundary points, a minimum snapping-distance map is stored
for every edge point. Subsequent boundary points will only be
allowed to snap to the edge point if the snapping distance is
less than or equal to the value in the minimum snappingdistance map. Therefore, boundary points will not arbitrarily
snap to false edges if there are no edges to be found. During

252

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

the first iteration, the search window has not been adaptively
altered to match the edge proximity for the image slice. There
is a possibility that a false edge will be included in the
Bayesian criterion. To prevent this, an inverse weighted
distance transform, M, is multiplied to Fi,k, where M is a
square matrix. Denoting pq M as an element in M and any two
points on the Bi as bp and bq.,pq M is defined in Eq. 1.

(1)

form a cluster with cluster center


which can be
considered as samples drawn from a Gaussian distribution
with mean
. Obviously
, the clusters
,,
, are
well-separated

,with

distinct
cluster
centers
, (because the constants
are distinct and the variance of the Gaussian noise is
assumed to be relatively small). This local intensity clustering
property is used to formulate the proposed method for image
segmentation and bias field estimation as follows.

3.4 Energy Formulation:


3.3 Local Intensity Clustering Property:
Region-based image segmentation methods
typically relies on a specific region descriptor (e.g. intensity
mean or a Gaussian distribution) of the intensities in each
region to be segmented. However, it is difficult to give such a
region descriptor for images with intensity inhomogeneities.
Moreover, intensity inhomogeneities often lead to overlap
between the distributions of the intensities in the
regions
Therefore, it is impossible to segment
these regions directly based on the pixel intensities
.Nevertheless, the property of local intensities is simple
,which can be effectively exploited in the formulation of our
method for image segmentation with simultaneous estimation
of the bias field based on the image model in (3) and the
assumptions A1and A2, we are able to derive a useful
property of local intensities, which is referred to as a local
intensity clustering property as described and justified below.
To be specific, we consider a circular neighbourhood with a
radius
centered
at
each
point
defined
by
.The partition
of the entire
domain

induces a partition of the neighbourhood


, i.e., forms a partition of
. For as

lowly varying bias field the values


circular Neighbourhood
are close to

for all in the


i.e. for

The above described local intensity clustering


property indicates that the intensities in the neighbourhood
can

be

classified

in

to
clusters with centers
, This allows us to apply the
standard K-means clustering to classify these local intensities.
Specifically, for the intensities
in the neighbourhood
, the K-means algorithm is an iterative process to minimize
the clustering criterion [19], which can be written in a
continuous form as
(6)
Where
is the cluster center of the
cluster, is the
mem
and
for
Since is the
membership function of the region
, we can rewrite as
dx.

(7)

In view of the clustering criterion in (7) and the


approximation of the cluster center by
we define a clustering criterion for classifying
the intensities in
as
(8)
Where
is introduced as a nonnegative window
function, also called kernel function, such that
for
.With the window function, the clustering

(4)

Thus, the intensities


close to the constant

in each sub region

are

criterion function

can be rewritten as

i.e.
(9)
(5)

This local clustering criterion function is a basic element in


the formulation of our method.

3.5 Multiphase Level Set Formulation:


Then, in view of the image model in (3), we have
For the case of
functions
of the regions

we can use two or more level set


to define membership functions
, such that

Where
is additive zero-mean Gaussian noise. Therefore,
the intensities in the set

For example, in the case of


functions
to define

, we use two level set

263

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

And
to give a three-phase level
set formulation of our method. For the four-phase case
,
the definition of
can be defined as

functional
by

in our multiphase level set formulation is defined


.

(25)

The minimization of the energy


in (25) with
respect tto the variable
can be performed
by solving the following gradient flow equations:

and
For notational simplicity, we denote these level set functions
by a vector valued function
.
Thus, the membership functions
can
be written as
. The energy in (10) can be converted
to
a
multiphase
level
set
formulation
With
given by
(16). For the function

)
.
.
.

This defines the regularization Terms


and

Where

and

each level set function

are defined by (19) and (20) for


.

, respectively. The energy

4. IMPLEMENTATION AND SIMULATION RESULTS:

Fig 3. Iterations are Performed to the Input Image


Fig 1. showsInput Of CT Image of bones
50 iterations

Fig 2. gives Histogram of CT Image With Density


Fig 4. CT image with 50 Iterations

274

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

Author Profile:

Fig 5. Blurring Of Input Image for Checking from Initial


Point of the Image
Bias corrected image

Fig 6. Input Image Of Bias Corrected Image


Histogram of Bias corrected image
200
180

Asif Hussian.Shaik received B.Tech


& M.Tech Degree in Electronics &
Communication Engg. from JNT
University, Hyderabad, India. He is
currently working towards PhD
Degree in Biomedical Image
Processing at JNTU University, Annantapur, India. Presently
he is with Annamacharya Institute of Technology & Sciences,
Rajampet, A.P., India. He is working as Assistant Professor in
Dept. of ECE. He presented many research papers in National
& International Conferences & journals. He is a member of
Professional societies like ISTE (India), BMESI (India),
IACSIT (Singapore), IAENG (Hongkong) and WASE
(Hongkong).His research interests include Signal Processing,
Time Series Analysis and Image Processing.

7. REFERENCES

160
140
120
100
80
60
40
20
0

Lokeswar Reddy.Kokatam received


B.Tech Degree in Electronics&
Communication Engg. from JNT
University,
Annanthapur, India.
Presently he is with Annamacharya
Institute of Technology & Sciences,
Rajampet, A.P., India in Dept. of ECE and pursuing his
M.Tech. His research interests include Signal Processing,
Time Series Analysis and Image Processing.

50

100

150

200

250

Fig 7. Image describes Histogram of Bias Corrected Image

5.Conclusion:
This work presents a variational level set framework
for segmentation and bias correction of images with intensity
inhomogeneities. Based on a generally accepted model of
images with intensity inhomogeneities and a derived local
intensity clustering property, the work defines an energy of
the level set functions that represent a partition of the image
domain and a bias field that accounts for the intensity
inhomogeneity. Segmentation and bias field estimation are
therefore jointly performed by minimizing the proposed
energy functional. The slowly varying property of the bias
field derived from the proposed energy is naturally ensured by
the data term in our variational framework, without the need
to impose an explicit smoothing term on the bias field. The
proposed method is much more robust to initialization than
the piecewise smooth model. Experimental results have
demonstrated superior performance of our method in terms of
accuracy, efficiency, and robustness.

6. ACKNOWLEDGMENTS:
The work was supported by my guide S.Asif
Hussain from Annamacharya Institute of Technology &
Sciences,,Rajampet ,India under Research grants of R.P.S
A.I.C.T.E, New delhi.

[1] Olabarriaga, S.D. and Smeulders, A.W.M., Interactionin


the Segmentation of Medical Images: A Survey,Med. Image
Analysis, 5: 127-142, 2001.
[2] Osher, S. and Sethian, J.A., Fronts Propagating
withCurvature Dependent Speed: Algorithms Based
onHamilton-Jacobi Formulations, J. Comp. Physics, 79:1249, 1988.
[3] P. N. T.Wells and M. Halliwell, Speckle in ultrasonic
imaging, Ultrasonics,vol. 19, pp. 225229, 1981.
[4] A. N. Evans and M. S. Nixon, Biased motion-adaptive
temporal filteringfor speckle-reduction in echocardiography,
IEEE Trans. Med.Imag., vol. 15, pp. 3950, Feb. 1996.
[5] V. Caselles, R. Kimmel, and G. Sapiro, Geodesic active
contours,Int. J. Comput. Vis., vol. 22, no. 1, pp. 6179, Feb.
1997.
[6] T. Chan and L. Vese, Active contours without edges,
IEEE Trans.Image.Process., vol. 10, no. 2, pp. 266277, Feb.
2001.
[7] S. Kichenassamy, A. Kumar, P. Olver, A. Tannenbaum,
and A. Yezzi,Gradient flows and geometric active contour
models, in Proc. 5th Int.Conf. Comput. Vis., 1995, pp. 810
815.
[8] R. Kimmel, A. Amir, and A. Bruckstein, Finding shortest
paths onsurfaces using level set propagation, IEEE Trans.
Pattern Anal.Mach.Intell., vol. 17, no. 6, pp. 635640, Jun.
1995.
[9] C. Li, C. Kao, J. C. Gore, and Z. Ding, Minimization of
region-scalablefitting energy for image segmentation, IEEE
Trans. ImageProcess., vol. 17, no. 10, pp. 19401949, Oct.
2008.
[10] R. Malladi, J. A. Sethian, and B. C.Vemuri, Shape
modeling with frontpropagation: A level set approach, IEEE
Trans. Pattern Anal. Mach.Intell., vol. 17, no. 2, pp. 158175,
Feb. 1995.

285

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

[11] R. Ronfard, Region-based strategies for active contour


models, Int.J.Comput. Vis., vol. 13, no. 2, pp. 229251, Oct.
1994.
[12] C. Samson, L. Blanc-Feraud, G. Aubert, and J. Zerubia,
A variationalmodel for image classification and restoration,
IEEE Trans. PatternAnal. Mach. Intell., vol. 22, no. 5, pp.
460472, May 2000.
[13] S. Theodoridis and K.Koutroumbas, Pattern Recognition.
NewYork:Academic, 2003.
[14] A. Tsai, A. Yezzi, and A. S.Willsky, Curve evolution
implementationof the Mumford-Shah functional for image
segmentation, denoising,interpolation, and magnification,
IEEE Trans. Image Process., vol. 10,no. 8, pp. 11691186,
Aug. 2001.
[15] A. Vasilevskiy and K. Siddiqi, Flux-maximizing
geometric flows,IEEE Trans. Pattern Anal. Mach. Intell.,
vol. 24, no. 12, pp. 15651578,Dec. 2002.
[16] L. Vese and T. Chan, A multiphase level set framework
for imagesegmentation using the Mumford and Shah model,
Int. J. Comput. Vis.,vol. 50, no. 3, pp. 271293, Dec. 2002.

296

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

MICROCONTROLLER BASED
LIFT SYSTEM
Jayshree sahu*Dr Amita Mahor**Dr S.k.sahu***
*NIIIST Bhopal M.P**NIIST Bhopal M.P.
*** Neelam collegeof engg. &Technology Agra
ABSTRACT-This paper presents the microcontroller
based lift system using microcontroller chip AT89C52
based on messege scheduling which basically belongs to
data based system in which change in one operation is
visible to other concurrent operation ,in which user can
programme each set by entering no of series of text
date time etc in data base system which can be
performed on the priority basis.
INDEX ITEM-Introduction Block diagram,circuit
diagram,circuit description,algorithm
INTRODUCTION-Conventional lift system based on
elevated control system which has no of disadvantages
as large no of cables ,risk factor,complicated,less
intelligent, uneconomical.But modern Distributed
elevated control system is in intelligent economical
system which provide all above reduced disadvantages.
3 recent innovations include permanent earth magnet
motors, machine room less, rail mounted gearless
machine aned microprocessor controls.

METHODOLOGY
Based on plate monitoring and control
lets understood with an example consider floor 1 & 2
lift

Floor
1
Floor
2

If push button 2 inside the lift=on,the lift is


not in the position 2, lift comes to 2
If push button 2 outside the lift=on,& the
lift is not in the position 2,lift comes to 2.
If push button 1 outside lift =on& the lift is
not in position 1 lift comes to 1

BLOCK DIAGRAM
SENSOR

SWITCHES

MICOCON
TROLLER
AT89C52

power

MOTOR
DRIVER CHIP

L
C

L293D

MOTOR

WORKING PRINCIPLE
Based on sensor and switch polling method
in which observation of changes made over
switches done by sensors through messege
scheduling.

COMPONENTS USED
microcontrollerAT89C52,5.5V,16MHZ,Pr
ocessed billions of instructions per cycle
per second
Motor driver chip L293D ,Intetfacing
between lift and
microcontroller,16pinIC runs on 5v dc.
Sensors and switches-reed type,.25w
power, 0-16 AT.
Crystal oscillator-11.059mhz for serial
communnication.
DC- Motor -1W, Rectifier-IN-4007

47
30

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

PCB LAYOUT

CIRCUIT OPERATION

RESULT

Then regulator IC7805 sends the +5Volt supply to all


the circuits, which will start the system.
Automatic lift is designed by using microcontroller
AT89C52.220 Volt ac power is supplied to step down
transformer, which again supply 12Volt ac power to
rectifier through diode which convert the ac power
supply to dc power.
After the system is on, firstly it will check the position
of the lift,if the lift is on ground, then it will start the
motor due to which movement of lift is start Lift will
remain at standby mode until new switch is pressed .

ALGORITHM

Step-1 Initialize the controller


Step-2 Initialize the LCD.
Step-3 port 1 as I/P port.
Step -4 Confirm port 3 as O/P port.
Step-5 Take the I/P from Switches for required
floor.
Step-6 Sense the required floor through the
sensor
and stop the lift at that floor.
Step-7 Repeat from step

The Model is Successfully Run with desired O/P

48
31

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

OLX Classified.

CONCLUSION

Electronics for you ( Oct. 2004)


Plateform monitoring and controlled message
schedu- ling data base logic is successfully
performed with an elevator
Lift moved in upward and downword direction
with closing and opening of door at the desired
floor.
Thereby removing earlier Non-Linearity presents
between the arms arrangement due to sprocket
and chain mechanism.Thus sprocket and chain
mechanism are completely removed.

FUTURE SCOPE
on

http://www.electronics4u.com
http://www.ttransenergic.co.au
Microprocessors
And
Interfacing(
Programming & Hardware)-Douglas V.Hall
Vedam Subrahmanayam- Power Electronics.
Alberto Sangiovanni-vincentalli, IEEE
microelectron,. (May 2003)8-18 .
Chris Herring,IEEE microelectron(Nov
2000)45-51.
Todd D Morton,Embedded

In Embedded system.
Security system baesd
modulation signal.

http://www.atmel.com

space

vector

Mapping of input vaiables through fuzzy logic


the microcontroller makes decision for what
action to take based on PLC method.
Further scope in Programmable Logic Controller
Design,Operator Console Board may be used for
display and keypad design.

REFERENCES
IEEE Expo 2011-Internal Elevator and Escalator
Expo
Microchip PICC Tutorial.
Spackling Tutorial.
Arm Cartox-A Series-High performance for
open operating system.
URE SCOPE

microelectronics,(prantice Hall inc New


Delhi India) ,2001.
Mayke Predco,Hand book of
Microcontroller (MC Graw
Hill,co,USA)1999.

BIBLIOGRAPHY
8051 MICROCONTROLLER and
embedded system by ali
maizidi,rolin d mekinly,Denny
carsey.pearson
Education..Edition 2.
Advanced microcontroller
application by Jarice mazidi,Gillirpe
maizidiPearson education.

49
32

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

Persian Signature Verification using Convolutional Neural Networks

Hurieh Khalajzadeh
Intelligent Systems Laboratory
(ISLAB), Faculty of Electrical
& Computer Engineering
K.N. Toosi University of
Technology, Tehran, Iran
h_khalajzadeh@ee.kntu.ac.ir

Mohammad Mansouri
Intelligent Systems Laboratory
(ISLAB), Faculty of Electrical &
Computer Engineering
K.N. Toosi University of
Technology, Tehran, Iran
mohammad.mansouri@ee.kntu.ac.ir

Abstract
The style of peoples handwritten signature is a
biometric feature used in person authentication. In this
paper, an offline signature verification scheme based
on Convolutional Neural Network (CNN) is proposed.
CNN focuses on the problems of feature extraction
without prior knowledge on the data. The classification
task is performed by Multilayer perceptron network
(MLP). This method is not only capable of extracting
features relevant to a given signature, but also robust
with regard to signature location changes and scale
variations when compared to classical methods. The
proposed method is evaluated on a dataset of Persian
signatures gathered originally from 22 people. The
simulation results reveal the efficiency of the suggested
algorithm.

1. Introduction
There is an increasing interest in trustworthy
identity verification. Biometric authentication is as
more trustable alternative to password based security
systems. This method is gaining popularity as it is
relatively hard to be forgotten, stolen, or guessed.
Several biometric features have been studied and
proved useful, including biological characteristics such
as fingerprint, face, iris, and retina pattern or behavioral
traits such as signature and speech. In compare with
conventional methods of identification such as
employing PIN-codes, passwords, magnet, or smart

Mohammad Teshnehlab
Intelligent Systems Laboratory
(ISLAB), Faculty of Electrical
& Computer Engineering
K.N. Toosi University of
Technology, Tehran, Iran
teshnehlab@eetd.kntu.ac.ir

cards; biometric characteristics offer several advantages


which are listed here. They are significant for each
individual, are always available, cannot be transferred
to another person, cannot be forgotten or stolen and are
always variable. However, because most biological
characteristics are unchangeable, a more serious
problem occurs when they are copied. So, one will
hesitate to use the disclosed biological features [1, 2].
Signature verification is an active research area in
the field of pattern recognition due to its usability in
many areas associated with security and access control.
Signature authentication is low cost biometric system
where awareness and uniqueness of person is necessary
[2, 3]. There are two main research fields in this area:
signature recognition (or identification) and signature
verification. The signature recognition problem consists
on identifying the author of a signature. In this problem
a signature database is searched to find the identity of a
given signer. This task is different from signature
verification. Verification defines the process of testing
a signature to decide whether a particular signature
truly belongs to a person or not. In this case, the output
is either accepting the signature as valid or rejecting it
as a forgery. Automatic signature verification is a wellknown and very active research field with important
applications. Different techniques have already been
applied in signature verification such as fuzzy logic [4],
geometric features [5, 6], global characteristics [7],
genetic algorithms [8], neural networks [9-11] and
hidden Markov models [12]. In comparison, the
signature recognition problem is more complex than
the signature verification problem. So, rather little

337

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

research effort has been focused on automatic signature


recognition [13].
Depending on the data acquisition method and
involved application, existing signature verification
systems are generally classified either online or offline
approaches. In general, online signature verification
systems present a better performance than the offline
signatures verification systems. In the online approach
the system uses not only the signature but also the
dynamic information obtained during the signing
process. However, online signature verification system
necessitates the presence of the signer at both time of
obtaining the reference signature and the verification
process which is not welcome by many applications.
Thus offline verification methods have more practical
application areas than that of the online signature
verification methods. The offline approach only uses
the digitalized image of a signature extracted from a
document called static information. So it does not
require any special processing devices. But
preprocessing is more difficult and time consuming in
offline systems due to unavailability of the dynamic
information. Developing an efficient and accurate
offline signature verification system is a challenging
task as signatures are sensitive to geometric
transformations, interpersonal signature collected in
course of time, complex background of the signature,
skilled forgery, non availability of time taken to sign,
lack of sufficient signatures samples for training the
system, noise introduced by scanning device, difference
in pen width, ink pattern and etc [14].
Convolutional neural networks are feed-forward
networks with the ability of extracting topological
properties from the input image without any
preprocessing needed. Therefore, CNNs could be
useful to overcome the preprocessing problems of
offline signature verification task. This paper presents
an offline signature verification system using a CNN
for extracting the features and a MLP for classification
of its extracted features. Proposed system is tested on
176 Persian signatures gathered from 22 people. The
simulation results expose the prosperity of using CNNs
in the task of offline signature verification.
The rest of the paper is organized as follows.
Section 2 presents an introduction to CNNs. Section 3
discusses the proposed CNN-based signature
verification system. Section 4 is about the dataset
which is used in experiments. Section 5 summarizes the
experiments and results. Finally, in Section 6
conclusive remarks are resumed.

2. Convolutional Neural Networks

Yann LeCun and Yoshua Bengio introduced the


concept of CNNs in 1995. A convolutional neural
network is a feed-forward network with the ability of
extracting topological properties from the input image.
It extracts features from the raw image and then a
classifier classifies extracted features. CNNs are
invariance to distortions and simple geometric
transformations like translation, scaling, rotation and
squeezing.
Convolutional Neural Networks combine three
architectural ideas to ensure some degree of shift, scale,
and distortion invariance: local receptive fields, shared
weights, and spatial or temporal sub-sampling [15].
The system is usually trained like a standard neural
network by back propagation. CNN layers are an
alternation of convolutional layers and subsampling
layers. A convolutional layer is used to extract features
from local receptive fields. It is organized in planes of
neurons called feature maps. In a network with a 55
convolution kernel each unit has 25 inputs connected to
a 55 area in the previous layer, which is the local
receptive field. A trainable weight is assigned to each
connection, but all units of one feature map share the
same weights. This feature which allows reducing the
number of trainable parameters is called weight sharing
technique and is applied in all CNN layers. LeNet5
[15], a fundamental model of CNNs proposed by
LeCun, has only 60,000 trainable parameters out of
345,308 connections. In order to extract different types
of local features, a convolutional layer is composed of
several feature maps. A reduction of the resolution of
the feature maps is performed through the subsampling
layers. In a network with a 22 subsampling filter
such a layer comprises as many feature map numbers as
the previous convolutional layer but with half the
number of rows and columns. Each unit j in mentioned
network is connected to a 22 receptive field,
computes the average of its four inputs yi which are
outputs from the corresponding feature map of the
previous layer, multiplies it by a trainable weight wj
and adds a trainable bias bj to obtain the activity level
vj:
4

vj

wj

i 1

yi

bj

(1)

In the rest of this section a particular convolutional


neural network identified as LeNet5 is described.
LeNet5 takes a raw image of 3232 pixels as input. It

348

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

is composed of seven layers: three convolutional layers


(C1, C3 and C5), two subsampling layers (S2 and S4),

one fully connected layer (F6) and the output layer.


These layers are connected as shown in Fig. 1.

Figure1. LeNet-5 architecture [15]

The first convolution layer is composed of six


feature maps of 32 32 units. The following
subsampling layer (S2) reduces by 2 the resolution,
while the next convolutional layer (C3) extends the
number of feature maps to 16. As shown in table 1 the
choice is made not to connect every feature map of S2
to every feature map of C3. Each unit of C3 is
connected to several receptive fields at identical
locations in a subset of feature maps of S2 [15, 16].
Table1. The Interconnection of the S2 Layer to C3 Layer
[15]

Finally, the output layer is an Euclidean RBF layer


of 10 units (for the 10 classes) whose outputs yj are
computed by
84

yj

( yi

yj

vj

A tanh( Sv j )

(2)

Where vj is the activity level of the unit. A and S are


two constant parameters for the sigmoid function.

0,

, 9.

(3)

i 1

Where yi is the output of the ith unit of the layer F6.


For each RBF neuron, yj is a penalty term measuring
the fitness of its inputs yi to its parameters wij. These
parameters are fixed and initialized to 1 or +1 to
represent stylized images of the characters drawn on a
712 bitmap that are targets for the previous layer
(hence the size 84 for the layer F6). Then the minimum
output gives the class of the input pattern [16].

3. Proposed
verification
The subsampling layer S4 acts as S2 and reduces the
size of the feature maps to 55. The last convolutional
layer C5 differs from C3 as follows. Each one of its
120 feature maps is connected to a receptive field on all
feature maps of S4. And since the feature maps of S4
are of size 55, the size of the feature maps of C5 is 1
1. Thus C5 is same as a fully connected layer. The
fully connected layer (F6) contains 84 units connected
to the 120 units of C5. All the units of the layers up to
F6 have a sigmoid activation function of the type:

wij ) ,

Method

for

signature

3.1. Feature extraction


Convolutional neural network is used to extracting
features in this paper. The proposed CNN which is
depicted in Fig. 2 takes a raw image of 180240
pixels as input. Input images are normalized between 0
and 1 and are given to a CNN. The CNN is composed
of nine layers: five convolutional layers, and four
subsampling layers. Multilayer perceptron network is
used for classifying the outputs of CNN instead of
radial basis function network which is used in LeNet5
network. Output layer or the last layer of the CNN is
given to a MLP network as the input. Number of
feature maps and dimention of convolutional and
subsampling filters are obtained experimentaly for all

359

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

of layers. The structure of multilayer perceptron


network is discribed in next subsection.
The first convolutional layer of the proposed CNN
has six feature maps, each of which has a resolution of
174 234, with a receptive field of 7 7. The second
layer, or the first subsampling layer, contains six
feature maps of size 87 117, with a receptive field of
2 2. The third layer is another convolutional layer and
has 16 feature maps with size 80 110, with a
receptive field of 8 8. The fourth layer contains 16
feature maps as well, each of which is of size 40 55.
The fifth convolutional layer has 30 feature maps, each
of which has a resolution of 34 48, with a receptive
field of 7 8. The sixth layer contains 30 feature maps
of size 17 24, with a receptive field of 2 2. The

seventh layer is another convolutional layer and has 50


feature maps with size 10 18, with a receptive field of
8 7. The eighth layer contains 50 feature maps as
well, each of which is of size 5 9. The ninth layer is a
convolutional layer with 120 feature maps, again with a
receptive field of 5 9.
All convolutional neural network neurons compute
their input by calculating the weighted sum and feeding
the result to the equ.2 in which A is chosen to be 1. The
number of parameters in this method is 412,166. Since
the input dimension is 180240 (43200) pixels,
parameter number is comparable with conventional
neural networks such as MLP.

Figure2. Proposed CNN for persian signature verification

10
36

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

3.2. Classification

Best Validation Performance is 0.088711 at epoch 1000


1

10

4. Data
In this research, 176 original Persian signatures
from 22 people are used. For each person, 8
signatures are considered for training, testing, and
validation of the algorithm. Some signature images
used in this paper are shown in Fig. 3. The size of
the images is 640480.

Figure3. Some Signature images used in the


experiment

5. Experiments and Results


A variety of experiments are performed and
results are presented in this paper. Different
numbers of feature maps and dimentions of
convolutional and subsampling filters are
considered and the best of them is selected. All
experiments were performed with 176 signatures
from 22 people. There was no overlap between the
training and testing sets. The performance of the
suggested method during the training session for
the training, testing and the validation dataset is
illustrated in Fig. 4. The training was stopped when
the minimum error for the validation dataset was
achieved.

Train
Validation
Test
Best

Mean Squared Error (mse)

A MLP network is used to classify the features


which are extracted with the applicability of the
CNN model. Last layer of the CNN is considered
as the input layer for the MLP network. This layer
is followed by a hidden layer with 84 neurons,
which is fully interconnected with the previous
layer. Finally, the last layer of this network is a
layer with one neuron which is target of the
network. The target is considered as 0 or 1. It
indicates wether the input signature is related to the
desired person or not. Targets 0 and 1 signifiy the
original and forgery signatures respectively. The
MPL network using to classify the features is
depicted in the 3 last layers of Fig. 2.

10

-1

10

-2

10

100

200

300

400
500
600
1000 Epochs

700

800

900

1000

Figure4. Performance of the proposed CNN structure

Experiments are performed 10 times for 1000


epochs. The average of 99.86 is resulted for
validation performance. The error is fixed after the
average of 785 epochs.

6. Conclusions
In this study a general CNN architecture is
applied to the task of Persian signature verification.
The style of peoples handwritten signature is a
biometric feature used in person authentication.
CNNs may be expected to achieve significantly
better results than standard feed-forward networks
for many tasks. The key characteristic of weight
sharing is appropriate when the input data is scarse.
In this paper, despite the fact that input data are
little in quantity and great in dimensionality good
results are obtained. Furthermore, CNNs are
invariance to distortions and simple geometric
transformations like translation, scaling, rotation
and squeezing. Another characteristic which is
more important than other characteristics for the
task of signature verification is the ability of CNNs
in extracting features from input data. So, it would
solve the preprocessing problem of offline
signature verification task. Proposed method is not
only capable of extracting features relevant to a
given signature, but also robust with regard to
signature location changes and scale variations
when compared to classical methods. The
simulation results reveal the efficiency of the
suggested algorithm.

6. References
Jonghyon Yi, Chulhan Lee, and Jaihie Kim, Online
signature verification using temporal shift estimated
by the phase of gabor filter, IEEE transactions on
signal processing, vol. 53, no. 2, february 2005,
776-783.
[2] Elaheh Dehghani, Mohsen Ebrahimi Moghaddam,
On-line Signature Verification Using ANFIS,
Proceedings of the 6th International Symposium on
[1]

11
37

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

Image and Signal Processing and Analysis (2009),


546-549.
Ama Herdadelen and Ethem Alpaydn, Dynamic
alignment distance based online signature
verification, The 13th Turkish Symposium on
Artificial Intelligence & Artificial Neural Networks
(2004), Izmir, Turkey.
Ismail, M.A., Gad, S., Off-line Arabic signature
recognition and verification, Pattern Recognition
33 (2000), 17271740.
Fang, B., Wang, Y.Y., Leung, C.H., Tang, Y.Y.,
Kwok, P.C.K., Tse, K.W., Wong, Y.K., A
smoothness index based approach for off-line
signature verification, Proceedings of ICDAR99,
(1999) 785787.
Hobby, J.D., Using shape and layout information
to find signatures, text, and graphics, Computer
Vision and Image Understanding, 80, (2000) 88
110.
Ramesh, V.E., Murty, M.N., Off-line signature
verification using genetically optimized weighted
features, Pattern Recognition 32, (1999) 217233.
Scholkopf, B., Sung, K., Burges, C., Girosi, F.,
Niyogi, P., Poggio, T., Vapnik, V., Comparing
support vector machines with Gaussian kernels to
radial basis function classifiers, AI Memo No.
1599, MIT (1996).
Bajaj, R., Chaudhury, S., Signature verification
using multiple neural classifiers, Pattern
Recognition (1997) 30, 17.
Baltzakis, H., Papamarkos, N., A new signature
verification technique based on a two-stage neural
network classifier, Engineering Applications of
Artificial Intelligence 14 (2001) 95103.
Velez, J.F., Sanchez, A., Moreno, A.B., Robust
off-line signature verification using compression
networks and positional cuttings, Proceedings of
IEEE International Conference on Neural Networks
for Signal Processing (NNSP 03). (2003) 627636.
Camino, J.L., Travieso, M.C., Morales, C.R., Ferrer,
M.A., Signature classification by hidden Markov
model, Proceedings of IEEE International
Carnahan Conference on Security Technology.
(1999) 481484.
E. Frias-Martinez, A. Sanchez, J. Velez, Support
vector machines versus multi-layer perceptrons for
efficient
off-line
signature
recognition,
Engineering Applications of Artificial Intelligence
19 (2006) 693704.
B H Shekar and R.K.Bharathi, Eigen-signature: a
robust and an efficient offline signature verification
algorithm, IEEE-International Conference on
Recent Trends in Information Technology, ICRTIT
2011, 134-138.
Y. LeCun, L. Bottou, Y. Bengio, P. Haffner,
Gradient-based learning applied to document
recognition", Proc. IEEE 86 (11) (1998) 22782324.
C. Y. Suen and G. Bloch F. Lauer, "A trainable
feature extractor for handwritten digit recognition,"
Pattern recognition, pp. 1816-1824, 2007.

12
38

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

A COMPARATIVE MODEL FOR


IMAGE PROCESSING & TEXTURE CLASSIFICATION
USING CROSS-DIAGONAL TEXTURE MATRIX (CDTM) &
GREY-LEVEL CO-OCCURRENCE MATRIX (GLCM)
Ankita Dhyani
Deepa Gupta
Sonia Saini

Email- ankitadhyani.84@gmail.com,
Email: dgupta@amity.edu,
Email: ssani@aiit.amity.edu

ABSTRACT:
The objective of this paper is to recognize different textures in an image, particularly a satellite
image where properties of the image are not distinctly identified. Texture classification involves
determining texture category of an observed image. The present study on Image Processing &
Texture Classification was undertaken with a view to develop a comparative study about the
texture classification methods. The algorithms implemented herein classify the different parts of
the image into distinct classes, each representing one property, which is different from the other
parts of the image. The aim is to produce a classification map of input image where each uniform
textured region is identified with its respective texture class. The classification is done on the basis
of texture of the image, which remains same throughout a region, which has a consistent property.
The classified areas can be assigned different colours, each representing one texture of the image.
In order to accomplish this, prior knowledge of the classes to be recognized is needed, texture
features extracted and then classical pattern classification techniques are used to do the
classification.
Examples where texture classification was applied as the appropriate texture processing method
include the classification of regions in satellite images into categories of land use. Here we have
implemented two methods namely- Cross Diagonal Texture Matrix (CDTM) and Grey-Level Cooccurrence Matrix (GLCM), which are based on properties of texture spectrum (TS) domain for
the satellite images. In CDTM, the texture unit is split into two separable texture units, namely,
Cross texture unit and Diagonal texture unit of four elements each. These four elements of each
texture unit occur along the cross direction and diagonal direction. For each pixel, CDTM has
been evaluated using various types of combinations of cross and diagonal texture units. GLCM, on
the other hand, is a tabulation of occurrence of different combinations of pixel brightness values
(grey levels) in an image. Basically, the GLCM expresses the spatial relationship between a graylevel in a pixel with the gray-level in the neighboring pixels. The study focuses on extraction of
entropy, energy, inertia and correlation features using several window sizes, which are calculated,
based on the GLCM. A maximum likelihood supervised classifier is used for classification. While
applying the algorithms on the images, we characterize our processed image by its texture
spectrum. In this paper we deal with extraction of micro texture unit of 7X7 window to represent
the local texture unit information of a given pixel and its neighborhood. The result shows that
increasing the window size showed no significant contribution in improving the classification
accuracy. In addition, results also indicate that the window size of 7x7 pixels is the optimal
window size for classification. The texture features of a GLCM and CDTM have been used for
comparison in discriminating natural texture images in experiments based on minimum distance.
Experimental results reveal that the features of the GLCM are superior to the ones given by
CDTM method for texture classification.

1. IMAGE PROCESSING
In computer science, image processing is any form of signal processing for which the
input is an image like photographs or frames of video; the output of image processing can
however be either an image or a set of characteristics or parameters related to the image.

36
39

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

Most image-processing techniques involve treating the image as a two-dimensional signal


and applying standard signal-processing techniques to it.
2. TEXTURE ANALYSIS
In many image-processing algorithms, simplifying assumptions are made about the
uniformity of intensities in local image regions. However, images of real objects often do
not exhibit regions of uniform intensities. For example, the image of a wooden surface is
not uniform but contains variations of intensities that form certain repeated patterns called
visual texture. The patterns can be the result of physical surface properties such as
roughness or oriented strands that often have a tactile quality, or they could be the result of
reflectance differences such as the color on a surface.
Image texture, is defined as a function of the spatial variation in pixel intensities (gray
values). One common application of image texture is the recognition of image regions
using texture properties. Texture is the most important visual cue in identifying
homogeneous regions. This is called Texture Classification. The objective of texture
classification is to produce a classification map of the input image where each uniform
textured region is identified with the texture class it belongs to.[6]

3. OBJECTIVE
The objective is to recognize different textures in an image, particularly a satellite image
wherein the properties of the image are not distinctly identified.
The algorithms implemented herein classify the different parts of the image into distinct
classes, each representing one property that is different from the other parts of the image.
The classification is done on the basis of texture of the image. The texture remains same
throughout a region that has a consistent property. The classified areas can be assigned
different colours, each representing one texture of the image.
Some application of image processing
Computer vision

Face detection

Feature detection

Medical image processing

Microscope image processing

Remote sensing
4. ADVANTAGES OF IMAGE PROCESSING
The Image Processing software will help Security personnel to use processed Images of
the terrain, which are much clearer than the images taken by satellites. These images give
a clear picture of the terrain by distinguishing the land region from the water bodies and
other geographical regions on the earth such as desert, forest, hills etc. Thus classification
of satellite images has following attributes: The software would help in discriminating the features of an unknown image
taken from a satellite.
It helps in extracting the features of an image that are not visible from our
naked eyes.
It helps in locating the terrain at the time of war.

37
40

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

5. METHODS IIMPLEMENTING TEXTURE CLASSIFICATION


There are several methods already in use for texture classification, and new techniques are
being constantly developed to classify Satellite Images of the terrain. Two of them are:
Cross-Diagonal Texture Matrix (CDTM)
Grey-Level Co-Occurrence Matrix (GLCM)
6. CROSS-DIAGONAL TEXTURE MATRIX
The present study was undertaken to develop a modified texture analysis algorithm based
on the properties of texture spectrum (TS) domain for the satellite images. In texture
analysis some specific spatial filters are required, which can transform the image based on
the textural features instead of changing the spectral properties; the image is thus
characterized by its texture spectrum. This study deals with extraction of micro texture
unit of 3X3 window to represent the local texture unit information of a given pixel and its
neighborhood. In this technique, the texture unit comprising of eight neighborhood
elements is decomposed into two separable texture units, namely, cross texture unit and
diagonal texture unit of four elements each. These four elements of each texture unit occur
along the cross direction and diagonal direction. For each pixel, cross- diagonal texture
matrix (CDTM) has been evaluated using several types of combinations of cross and
diagonal texture units. This approach drastically reduces the computational time. The
occurrence frequency of each CDTM value obtained in the entire image is recorded. Two
different approaches, namely, mean and median, have been subsequently carried out while
processing the data. It is observed that the median technique with 3X3 window shows best
result in the reduction of noise in satellite data.[3]
6.1. INTRODUCTION
Texture analysis plays an important role in image processing, image classification and in
the interpretation of image data. Several publications (Haralick et al, 1973; He et al, 1988;
Gonzalez and Woods, 1992; Chen et al, 1995) have appeared dealing with the technique
and role of textural analysis in interpretation of image. From geological point of view, it is
being increasingly used in the interpretation and understanding of terrain. In a satellite
imagery of an area, where an array or group of pixels characteristically represents the
terrain, it is imperative that analysis of textural features of the entire image must be
undertaken.
Textural analysis has been used in image segmentation and in classification problems. In
texture segmentation, the pixels are grouped together to form regions of uniform texture;
while in textural classification the object is to partition the image into a set of sub-regions,
each of which is homogeneously textured. Two different approaches have been proposed
for textural analysis. One of them is the structural approach while the other is statistical
approach. Both the approaches are found to have certain limitations.
The purpose of this study is to develop the cross-diagonal texture filtering technique using
several approaches and examine its suitability in elimination of noise in satellite remote
sensing data.[3]

38
41

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

6.2. METHODOLOGY
6.2.1 Texture Spectrum
The basic concept of textural spectrum method for analysis was introduced by He & Wang
(1990, 1991a, and 1991b) is that a texture can be extracted from a neighborhood of 3X3
window, which constitute the smallest unit called texture unit. In the neighborhood of
3X3 window comprising of nine elements respectively as V = [V1 , V2 , V3 , V4 ,V0 , V5 ,
V6 , V7 ,V8 ] where V0 is the central pixel value, and V1,....., V8 are the values of
neighboring pixels within the window (Figure 3.5). The corresponding texture unit for this
window is then a set containing eight elements surrounding the central pixel, represented
as TU = (E1, E2, E3, E4, E5, E6, E7, E8) where Ei is defined as,

and the element Ei occupies the corresponding Vi pixel. Since each of the eight elements
of the texture unit has any one of these three values (0, 1 or 2), the texture unit value, TU,
can range from 0 to 6560 (38 , i.e., 6561 possible values). The texture units are labeled by
using the relation,

where, NTU is the texture unit value. The occurrence distribution of texture unit is called
the texture spectrum (TS). Each texture unit represents the local texture information of a
3x3 pixels, and hence statistics of all the texture units in an image represent the complete
texture aspect of entire satellite image. Texture spectrum has been used in texture
characterization and classification, and the computational time depends on the number of
texture units identified in the image.[3]
6.2.2 Cross Diagonal Texture Matrix
Al-Janobi (2001) has proposed a cross-diagonal texture matrix technique, in which the
eight neighboring pixels of a 3x3 window is broken up into two groups of four elements
each at cross and diagonal positions. These groups are named as cross texture unit (CTU)
and diagonal texture unit (DTU) respectively. Each of the four elements of these units is
assigned a value (0, 1 or 2) depending on the gray level difference of the corresponding
pixel with that of the central pixel of the 3X3 window. Now these texture units can have
values from 0 to 80 (34, i.e., 81 possible values).[1]

39
42

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

Figure 1. Formation of cross and diagonal texture units

Cross texture unit (CTU) and diagonal texture unit (DTU) can be defined as:

Where, NCTU and NDTU are the cross texture and diagonal texture unit numbers
respectively; Eci and Edi are the ith element of the texture unit.[1]
6.2.3 Modified Texture Filter
In the proposed method, NCTU and NDTU values have been evaluated which range from
0 to 80. For each type of texture unit, there can be four possible ways of ordering, which
give four different values of CTU and DTU. Finally a cross diagonal texture matrix
(CDTM) value for each pixel position is evaluated from corresponding CTU and DTU
possible values. In the present work, several techniques of estimating CDTM values have
been undertaken, which are listed below.

40
43

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

Where, NiCTU and NjDTU are the ordering ways for evaluation of NCTU and NDTU. After
obtaining the CDTM values of 3x3 window through entire image the occurrence
frequency of each CDTM values are recorded. This CDTM value is then assigned to the
respective pixel locations. Now based on the range of the CDTM values we divide the
CDTM values into different classes and give specific colours to all the classes. Thus we
obtain our resultant CDTM classified image. Same procedure has been followed with 7x7
windows also. The techniques described above have been applied on several satellite
imagery spiked with induced noises of different percentages.
6.3 Flowchart: CDTM

7. GREY-LEVEL CO-OCCURRENCE MATRIX


Basic of GLCM Texture considers the relation between two neighboring pixels in one
offset, as the second order texture. The grey value relationships in a target are transformed
into the co-occurrence matrix space by a given window size such as 3x3, 5x5, 7x7 and so
forth.
In the transformation from the image space into the co-occurrence matrix space, the
neighboring pixels in one or some of the eight defined directions can be used; normally,
four direction such as 0, 45, 90, and 135 is initially regarded, and its reverse direction
(negative direction) can be also counted into account.[5]

41
44

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

Therefore, general GLCM texture measure is dependent upon matrix size and
directionality, and known measures such as contrast, entropy, energy, angular second
moment (ASM) and correlation are used.[5]
7.1 Introduction
Grey-Level Co-occurrence Matrix texture measurements have been proposed by Haralick
in the 1970s. Its use improves classification of satellite images.
This study concerns some of the most commonly used texture measures, which are derived
from the Grey Level Co-occurrence Matrix (GLCM). This involves:
Defining a Grey Level Co-occurrence Matrix (GLCM)
Creating a GLCM
Using it to calculate texture
Understanding how calculations are used to build up a texture image
Textures in images quantify:
Grey level differences (contrast)
Defined size of area where change occurs (window)
Directionality and its slope
Definition: The GLCM is a tabulation of how often different combinations of pixel
brightness values (grey levels) occur in an image.
Properties of the GLCM1. It is square
2. Has the same number of rows and columns as the quantization level of the
image
3. It is symmetrical around the diagonal
The GLCM is used for a series of "second order" texture calculations. Second order
measures consider the relationship between groups of two (usually neighboring) pixels in
the original image.
7.2 Steps in creating a symmetrical normalized GLCM:
1.
2.
3.
4.
5.

Create a framework matrix


Decide on the spatial relation between the reference and neighbor pixel
Count the occurrences and fill in the framework matrix
Add the matrix to its transpose to make it symmetrical
Normalize the matrix to turn it into probabilities by using the formula-

Where, i is the row number and j is the column number.


7.3 Creating a Texture Image
The result of a texture calculation is a single number representing the entire window. This
number is put in the place of the centre pixel of the window, then the window is moved
one pixel and the process is repeated of calculating a new GLCM and a new texture
measure. In this way an entire image is built up of texture values.

42
45

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

Edge of image problems Each cell in a window must sit over an occupied image cell.
This means that the centre pixel of the window cannot be an edge pixel of the image. If a
window has dimension N x N, a strip (N-1)/2 pixels wide around the image will remain
unoccupied. The usual way of handling this is to fill in these edge pixels with the nearest
texture calculation.

7.4 Groups of texture measures


Now we calculate the GLCM_Texture i.e. the compute and generate the types of texture
measures such as- Contrast, Entropy, ASM (Angular Second Moment), Energy and
Correlation. These are expressed as follows:
Contrast Equation Entropy ASM (Angular Second Moment) Energy-

Correlationwhere, i and j are coordinates of the co-occurrence matrix space, P(i,j) is element in the
co-occurrence matrix at the coordinates i and j.[5]
7.5 Implementation of GLCM
Stand-alone application program for GLCM texture measure and texture image creation is
implemented in this study. In this program, general graphic image formatted as jpg, tiff,
bmp can be used as input data. Also, a user determines two texture parameters such as
window size and direction in the main frame. The grey value relationships in the target
image are transformed into the co-occurrence matrix space by a given window size such as
3x3, 5x5, 7x7 and 11x11, the neighboring pixels as one of the four directions as East-West
of 0, North-East of 45, North-South of 90, North-West of 135, and omni-direction will

43
46

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

be computed in the co-occurrence matrix space. Among them, texture image is obtained as
the resultant GLCM classified image.[5]
7.6 Flowchart: GLCM

.
8. COMPARISON BETWEEN CDTM AND GLCM
Cross-Diagonal Texture Matrix (CDTM)

Figure 2a. Input Image


Figure 2b. Resulting CDTM Classified Image
Grey-Level Co-occurrence Matrix (GLCM)

Figure 3a. Input Image

Figure 3b. Resulting GLCM Classified Image

44
47

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

CONCLUSION
Most previous studies for second order texture analysis have been directed toward the
improvement of classification accuracy, with supervised or un-supervised classification
methods, showing high accuracy [7]. Scope of this study is somewhat different from
previous works. An application program for texture measures based on CDTM and GLCM
is newly implemented in this study. By using this program, CDTM and GLCM based
texture images by different quantization level, window size, and texture type are created
with the high-resolution satellite image of the terrain. In application of feature
characterization to texture measures, texture images is helpful to detect shadow zone,
classify building types; and distinguishing the land, water, forest, desert etc. regions from
one another which are not fully analyzed in this study.
In this paper we compare two different image texture classification techniques based on
feature extraction by first and higher order statistical methods that have been applied on
our images. The extracted features are used for unsupervised pixel classification with
CDTM and GLCM algorithms to obtain the different classes in the image [4]. From the
results obtained with 3x3, 5x5 and 7x7 windows on several satellite imagery data
corrupted with different percentages of induced noise, it is found that the results with 7x7
windows are comparatively more effective in removing the noises from the imagery data
than that by the 3x3 and 5x5 texture windows. Another very important advantage of the
proposed technique is the substantial reduction in the computational time involved using
CDTM method. Moreover,
The algorithms work well for distantly clicked images such as satellite images.
The algorithms can successfully recognize distinct regions in an image on the basis
of textures extracted.
When the input data to an algorithm is too large to be processed and it is suspected
to be notoriously redundant (much data, but not much information) then the input
data will be transformed into a reduced representation set of features.
The system helps in simplifying the amount of resources required to describe a
large set of data accurately.
The extracted features are used for unsupervised pixel classification to obtain the different
classes in the image, before using the algorithm. Two methods have been tested with very
heterogeneous results [8]. The hypothesis took into account for the textural analysis
methods are currently modified to justify them more accurately, especially concerning the
number of classes and the size of the analysis window.
Another five parameters were calculated from the grey-level co-occurrence matrix
(GLCM). The linear discriminant analysis was applied to sets of up to five parameters and
then the performances were assessed. The most relevant individual parameter was the
contrast (con) (from the GLCM algorithm).[2]
This paper presents a new texture analysis method incorporating with the properties of
both the gray-level co-occurrence matrix (GLCM) and cross-diagonal texture matrix
(CDTM) methods. The co-occurrence features extracted from the crossdiagonal texture
matrix provide complete texture information about an image. The performance of these
features in discriminating the texture aspects of pictorial images has been evaluated. The
textural features from the GLCM and CDTM have been used for comparison in
discriminating some of satellite images. . Based on the resultant classified images of the

45
48

International Journal of Engineering Research and Technology (IJERT)


ISSN: 2278-0181
Vol. 1 Issue 2, April - 2012

terrain it is observed that the features of the classified image in GLCM were more clear
and vivid as compared to what we see in CDTM.
Although the GLCM approach is much less computationally intensive than the CDTM, it
nonetheless requires massive amounts of calculation. Most of this computation time is
spent in stepping through the input image and compiling the matrices themselves.
Therefore, if the calculation time for these matrices could be reduced, the GLCM
technique would become more practical.
REFERENCES
[1] Abdulrahman A. AL-JANOBI and AmarNishad M. THOTTAM , Testing and Evaluation of
Cross-Diagonal Texture Matrix Method.
[2] Alvarenga AV, Pereira WC, Infantosi AF, Azevedo CM., 2007 Complexity curve and grey level
co-occurrence matrix in the texture evaluation of breast tumor on ultrasound images.
[3] Amit K. Bhattacharya, P. K. Shrivastava and Anil Bhagat, 2001 A Modified Texture Filtering
technique for satellite Images.
[4] F. Cointault, L. Journaux, M.-F. Destain, and P. Gouton (France), 2008, Wheat Ear Detection by
Textural Analysis for Improving the Manual Countings.
[5] Kiwon Lee, So Hee Jeon and Byung-Doo Kwon Urban Feature Characterization using HighResolution Satellite Imagery: Texture Analysis Approach.
[6] M. Tuceryan and A. K. Jain, ``Texture Analysis,'' In The Handbook of Pattern Recognition and
Computer Vision (2nd Edition), by C. H. Chen, L. F. Pau, P. S. P. Wang (eds.), pp. 207-248, World
Scientific Publishing Co., 1998. (Abstract) (Book Chapter).
[7] Supervised and Unsupervised Land Use Classification.
[8] Varsha Turkar and Y.S. Rao , Supervised and Unsupervised Classification of PolSAR Images
from SIR-C and ALOS/PALSAR Using PolSARPro.

BIBLIOGRAPHY S. Jenicka #1, A. Suruliandi Comparative Study of Texture models Using Supervised
Segmentation.
 Mihran Tuceryan & Anil K. Jain Texture Analysis.
 Wikantika, K., M.Y. Andi Baso, Hadi F. Analysis of Window Size and Classification Accuracy
using Spectral and Textural Information from JERS-1 SAR Satellite Image.
 Hawkins, J. K., Textural Properties for Pattern Recognition, In Picture Processing and
Psychopictorics, B. Lipkin and A. Rosenfeld (editors), Academic Press, New York, 1969.
 S. Karkanis, K. Galousi and D. Maroulis Classification of Endoscopic Images Based on Texture
Spectrum.
 R. Xu, X. Zhao, X. Li, C. Kwan, and C.-I Chang Target Detection with Improved Image Texture
Feature Coding Method and Support Vector Machine.
 Lalit Gupta, Shivani G. Rao, Sukhendu Das Classification Of Textures In Sar Images Using MultiChannel Multi-Resolution Filters.
 B. J. Lei, Emile A. Hendriks, M.J.T. Reinders On Feature Extraction from Images.
Web Sites
http://en.wikipedia.org/wiki/Image_processing
http://www.ph.tn.tudelft.nl/Courses/FIP/noframes/fip-Contents.html
http://en.wikipedia.org/wiki/Segmentation_(image_processing)
http://www.patentstorm.us/patents/5949907/claims.html
http://www.gisdevelopment.net/technology/ip/ma05190.htm
http://www.fp.ucalgary.ca/mhallbey/tutorial.htm
http://cat.inist.fr/?aModele=afficheN&cpsidt=14572052
http://www.jmp.org.in/article.asp?issn=0971-6203;
year=2008;volume=33;issue=3;spage=119;epage=126;aulast=Sharma

46
49

Our Publication

www.ijmera.org

www.ijeera.org

www.ijcera.org

www.ijcsrt.org

www.ijecer.org

www.ijmrs.org

Published By

Publications Pvt. Ltd.


Engineering and Science Research Support Academy

Website : www.esrsa.org, E-mail : info@esrsa.org

Vous aimerez peut-être aussi